release: canary → main (31 commits) — Python truth-layer + cross-platform install + 13 bug fixes#191
Conversation
…iteration step 1) (#151) refactor(adapters): iso_to_epoch dedupes BSD/GNU date split (3 callsites) Pre-fix: the BSD-vs-GNU 'date' fork had its own \`-j -u -f ... || date -u -d ...\` fallback chain at three callsites (heartbeat parse in cmd_connect, _format_relative_time, _is_stale). Each chain had slightly different error handling — heartbeat returned empty on parse-fail and skipped the staleness check; _format_relative_time echoed the raw ts; _is_stale returned 1. Three places, three slight variations of "the same idea." Future fixes (e.g. WSL date drift, Cygwin coreutils gaps) had to land at every site. Post-fix: single iso_to_epoch helper near the platform adapters block. Tries BSD → GNU → python3 datetime fallback. All three callsites route through it. Each callsite kept its OWN error handling (their semantics differ, that's fine — the parse layer is what was duplicated). Adds python3 fallback that didn't exist anywhere before — useful on minimal MSYS/Cygwin where neither date flavor parses. Unit-tested in scenario_platform_adapters with a known timestamp + empty + garbage inputs. Joel's directive 2026-04-27: "look for ways to keep these consistent, permanently." This is one pattern; the deeper bash↔PowerShell drift question is a separate architectural conversation (Python truth-layer candidate). Filing the architectural piece separately so the immediate adapter dedupe can ship without blocking on the bigger discussion. Test posture: - platform_adapters: 11/11 (was 8/8; +3 for iso_to_epoch) - list / rooms / ls: 4/4 (downstream consumer via _format_relative_time) - part_persists, part_keeps_sidecar: 8/8 + 6/6 (heartbeat path, unchanged behavior)
…and -v (#153) Bug found by continuum-b69f via cross-Mac/Windows substrate-bypass gist on 2026-04-27. Symptoms on Windows Git Bash: airc connect failed with "Can't reach 100.91.51.87:7547. Is the host running 'airc connect'?" even though Test-NetConnection succeeded on the port and a manual python socket connect to the same address completed the handshake. ## Root cause Modern Windows ships %LOCALAPPDATA%\Microsoft\WindowsApps\python3.exe — a Store-installer shim. The file exists, satisfies `command -v python3`, but invocation exits 49 with stderr "Python was not found; run without arguments to install from the Microsoft Store..." It is NOT a real interpreter. airc top-level (lines 17-31 pre-fix) gated python3 detection on `command -v` alone. The Store stub fooled the gate, so the python -> python3 shim NEVER installed. Every later `python3 -c "..."` inside the script — including the pair handshake at line 2495 — silently hit the Store stub, exited 49, and bash captured _pair_ok=0. The script then printed the misleading "Can't reach" message and discarded the captured stderr (the SECOND bug — see below). ## Fix 1. **airc top-level**: probe with `python3 --version >/dev/null 2>&1`, not bare `command -v`. Store stub fails fast → fallback to real `python` (also strict-probed) → if neither works, ERROR with a Windows-specific hint pointing at App execution aliases. 2. **die "Can't reach"**: print the captured handshake `$response` (stderr+stdout from 2>&1) before the die. Per the global "never swallow errors" rule — evidence is for the debugger, not the trash. Pre-fix, the actual Store-stub error was invisible to anyone trying to diagnose. 3. **_doctor_probe**: same strict --version probe. Distinguishes [BROKEN] (on PATH but stub) from [MISSING] (absent) so the fix hint matches the actual condition. Pre-fix `airc doctor` reported "[ok] python3" against the stub. 4. **install.sh prereq scan**: same strict probe in the installer's missing-prereq loop. Pre-fix, install.sh printed "All required prereqs present" against a stub-only Windows install, then airc immediately silent-fail-cascaded on first run. ## Why airc didn't catch this earlier Windows + Microsoft Store python3 alias is the default since ~Windows 10 1903. The stub is invisible to existence-only probes. Anyone who installs Python from python.org but doesn't disable the App execution aliases (the default state) hits this. Joel hit it after rebooting his Windows install today; continuum-b69f isolated it within ~5 min on the substrate-bypass gist. ## Test posture Manual: simulated Store stub locally with `exit 49` script on PATH: - Stub-only: ERROR with Windows-specific hint ✓ - Stub + real py: fallback shim activates, airc runs ✓ Mac integration: identity 19/19, whois 5/5, quit 9/9, away 5/5, list 4/4, part_persists running. ## Out of scope The deeper bash↔PowerShell drift problem (#152) remains. This PR fixes ONE symptom of that drift surfacing in production. Per Joel 2026-04-27: "make it work first then find patterns" — shipping the work-now fix; architectural unification is its own conversation.
…ral gists (#154) fix(sidecar): inherit --no-gist flag from primary so test fixtures stop leaking #general gists Bug found by continuum-b69f via cross-Mac/Windows substrate-bypass gist 2026-04-27. After the python3 detection fix landed on Windows (PR #153), continuum's airc connect resolved a #general gist that pointed at port 7556 — a Mac-side TEST FIXTURE corpse. Pre-fix: spawn_general_sidecar_if_wanted at airc:1159 spawned the sidecar with `--room general` only, ignoring the parent cmd_connect's `--no-gist` flag. Test scenarios (scenario_part_persists, scenario_general_sidecar_default, scenario_part_keeps_sidecar) spawn the primary with --no-gist --no-discovery to stay isolated, but the sidecar then went and PUBLISHED a real `airc room: general` gist on the live joelteply gh namespace. cleanup_all's `kill -9` bypasses the on-exit gist-delete trap, so the gist orphans forever. Real users discovering #general via auto-scope hit the orphan first (usually most-recent), try TCP to a port whose process exited 30 minutes ago, get RST, end up confused. ## Fix If `use_gist=0` (set by --no-gist on the primary), pass --no-gist to the sidecar spawn too. The flag inherits via the new `_sidecar_args` array. AIRC_NO_DISCOVERY=1 already inherits via subshell environment; only the flag needed explicit forwarding. ## Why integration tests didn't catch this The leakage happens on the live gh account. Integration tests run as Joel on his own gh account, so the leaked gists pollute his own substrate — invisible to test assertions, very visible to real users on the same gh account. Cross-account QA caught it (continuum-b69f's Windows tab discovered the orphan that Mac tests had created an hour earlier). ## Aftermath Already manually deleted 6 orphan gists post-cleanup (alpha #general + 5x cakr-test-*). With this fix, future test runs stop creating new ones. The trap-bypassed-by-kill-9 issue is a separate bug (test fixtures should kill politely). ## Test posture - part_keeps_sidecar: 6/6 - part_persists: 8/8 - general_sidecar_default: 12/12
…s limit will kill people) (#155) fix(gist): git-clone fallback + |\| true guards so rate-limit doesn't kill resolution Bug found by continuum-b69f mid-cross-machine bring-up 2026-04-27: gh's gist sub-bucket throttled at ~60 reads/hr; a busy session exhausts it; every subsequent `gh api gists/<id>` AND `gh gist view` returns HTTP 403; airc's gist-resolution chain failed silently; discovery hung at "Resolving gist...". Joel: "this limit will kill people." ## Two bugs in one ### 1. set -e + pipefail aborts script on rate-limit The existing chain: ```bash raw_content=$(gh api "gists/$gist_id" 2>/dev/null \ | jq -r '.files | to_entries[0].value.content // empty' 2>/dev/null) ``` With `set -euo pipefail` at airc:9, when `gh api` returns 403: - pipefail propagates the non-zero from gh up the pipeline - the `$(...)` capture inherits the non-zero - set -e aborts the script before reaching the next fallback Net: rate-limit hit = entire script dies with exit 5, no diagnostic, no fallback attempted. Fix: each path wrapped with `|| true` so a non-zero exit becomes empty `$raw_content` and the `[ -z ]` gate flows through to the next fallback. ### 2. All existing fallbacks use the same throttled REST bucket Even with the abort fixed, paths A (gh api+jq) / B (gh view --raw) / C (curl + jq) all hit gist sub-bucket which is the EXACT thing that's exhausted. New fallback: git clone the gist's git remote. Git transport is on a separate quota — keeps working when REST is throttled. Adds ~1s on the slow path, unblocks discovery completely. ## New chain (insertion-ordered fallthrough) 1. gh api + jq (REST, fast — primary path) 2. gh gist view --raw (REST, fallback) 3. **git clone gist remote** (NEW — bypasses REST sub-bucket) 4. curl + jq (REST, anonymous last resort) If you have git, you survive rate-limit. The git-clone path was verified live: while gh api returned 403 in <0.3s, git clone of the same gist returned the JSON envelope cleanly in ~0.3s. ## Test posture (Mac, regression check) - part_persists: 8/8 - list: 4/4 - general_sidecar_default: 12/12 The actual rate-limit-recovery path was verified by `bash -x` trace under live throttle: `+ raw_content='{` shows git-clone populating raw_content after both gh paths returned empty. ## Out of scope (filed sep) airc.ps1 has the same gist-resolution chain pattern (REST-only). Same fix applies — Windows iteration step 2 in the canary backlog.
…thon (PR #153 follow-up) (#156) feat(doctor,install): probe sshd readiness so hosting works on Windows + scope ssh-stub probe to python only Joel's directive 2026-04-27: "Both need to host so just part of doctor and/or install" — Windows users need sshd to host airc rooms, but Windows ships OpenSSH client only (server is opt-in capability since Win10 1809). Pre-fix: install printed "All required prereqs present" against a Windows install with no sshd; airc doctor probed for ssh client only. First cross-machine pair silently failed at the ssh-tail step. ## Changes ### `airc doctor` — new `_doctor_probe_sshd` per-platform - **macOS**: launchctl + `systemsetup -getremotelogin` for the Remote Login state. Fix hint: System Settings -> Sharing -> Remote Login. - **Linux/WSL**: `systemctl is-active` on `ssh` (Debian/Ubuntu unit name) and `sshd` (RHEL/Fedora). Fix hints for both pkgmgrs. - **Windows-bash**: `powershell.exe -Command "(Get-Service sshd -ErrorAction SilentlyContinue).Status"` distinguishes: Running → ok Stopped/StopPending/StartPending/Paused → BROKEN with start hint empty → MISSING with Add-WindowsCapability hint - **Other**: info-level skip; doesn't penalize. ### `install.sh` — same probe at install time Same per-platform branches; warn-only (no auto-install since elevation needed on Windows). User runs the printed PowerShell commands once, re-runs installer, sshd is up. ### `_doctor_probe` — scope strict-probe to python only (BUG REGRESSED FROM PR #153) The PR #153 strict-probe applied `--version` to ALL binaries. macOS BSD ssh-keygen exits 1 on `--version` ("illegal option"), so doctor false- positived [BROKEN] on every Mac. The new sshd probe surfaced this regression on its first run (clean Mac doctor output revealed the stale [BROKEN] ssh-keygen line). Fix: only python and python3 have shadow-aliases on Windows (Microsoft Store stubs). Other binaries are uniquely shipped by the user's package manager — bare `command -v` is correct + portable. ## Why this matters "Both need to host" — the airc design assumes every peer is a first-class host candidate. Pre-fix Windows users discovered they COULDN'T host until they hit it the hard way (peers can't connect, no diagnostic). Post-fix, install + doctor surface it immediately with the exact admin-PowerShell commands. ## Test posture (Mac regression) - part_persists: 8/8 - list: 4/4 - general_sidecar_default: 12/12 - platform_adapters: 11/11 - airc doctor live: 7/7 prereqs ok, 1 sshd MISSING (this Mac has Remote Login off — correctly flagged with the macOS-specific fix). ## Out of scope `airc.ps1` should also gain an equivalent probe + install.ps1 should auto-install + start sshd when run elevated. Queued for Windows iteration step 3.
…needs to be in the install" gap) (#157) feat(install): auto-install + start sshd during install (close architectural gap) Joel's directive 2026-04-27 (via continuum-b69f relay through coord gist): > "if we can prompt the user, we do NOT have them do annoying setup > shit we automate into install, which gets what it needs done, no > later interaction and definitely not MORE after first install. and > detect via doctor if missing. and tell them how to remedy." Translation: 1. install.{sh,ps1} does end-to-end setup with elevation prompts (ONE elevation moment during first install). No separate post-install steps for the user to remember. 2. airc doctor is drift detection — catches when something flipped off after install. Already done in PR #156. 3. Remedy commands are AI-runnable — doctor's output is a contract with the user's AI. Already done in PR #156. Missing piece (this PR): install.{sh,ps1} should actually RUN the missing prereq commands during install, not just probe + report. ## Changes ### install.sh — `_ensure_sshd_running` Per-platform, idempotent (no-op if already running): - **macOS**: probes Remote Login state (launchctl/systemsetup); if off, runs `sudo systemsetup -setremotelogin on` with one sudo prompt. - **Linux**: probes systemctl (Debian's ssh and RHEL's sshd unit names); if missing, installs openssh-server via the platform's package manager + enables-and-starts the right unit. - **Windows-bash**: probes via `powershell.exe Get-Service sshd`; if missing or stopped, self-elevates via `powershell.exe Start-Process -Verb RunAs` with all three commands inline (Add-WindowsCapability + Start-Service + Set-Service Automatic) → ONE UAC prompt for the user. \`AIRC_SKIP_SSHD=1\` short-circuits for headless CI / config-managed environments. ### install.ps1 — `Install-OpenSSHServer` Mirrors the bash logic for the native Windows installer. Probes Get-Service sshd, then Get-WindowsCapability for state. Three commands: Add-WindowsCapability, Start-Service, Set-Service Automatic. Catches admin-required errors and prints the manual fallback (same shape as existing Install-OpenSSHClient). Hooked into the install flow right after Install-OpenSSHClient. ## Idempotency Both install.sh and install.ps1 short-circuit if sshd is already Running. Re-running install.sh on a working box doesn't re-prompt for sudo or UAC. Same for install.ps1. ## Test posture (Mac regression) - part_persists: 8/8 - list: 4/4 - general_sidecar_default: 12/12 - platform_adapters: 11/11 ## Out of scope End-to-end Mac↔Windows substrate test once both sides have sshd up (parallel work; not blocked on this PR).
…g (PR #156/#157 live-test followups) (#158) fix(sshd-probe): macOS detection without sudo + osascript admin dialog when non-interactive Two issues found while running PR #157 live on Mac 2026-04-27: ## Bug 1: launchctl list (user scope) doesn't show system services Pre-fix probe: ```bash launchctl list 2>/dev/null | grep -q "com\.openssh\.sshd" ``` Bare `launchctl list` is user-scope. Returns user-launched LaunchAgents only — never system-level launchd jobs like com.openssh.sshd. The fallback `systemsetup -getremotelogin` requires sudo to read state. Net: doctor reported `[MISSING] sshd` even when Remote Login was fully enabled and active sshd-session processes were forking. Fix: `launchctl print system` (no sudo needed) lists system services including com.openssh.sshd when Remote Login is on. Anchor regex on service-id boundary so we don't false-positive on per-connection session subkeys (com.openssh.sshd.<UUID>) which exist transiently even when Remote Login is just toggling. ## Bug 2: install.sh sudo path fails in non-interactive contexts When install.sh runs from a Monitor-spawned shell or curl|bash pipe, no TTY is attached. `sudo` then says "a terminal is required to read the password; either use the -S option to read from standard input or configure an askpass helper." Same problem as Joel hit running this from his Claude Code Bash tool. Fix: detect TTY presence (\`[ -t 0 ] && [ -t 1 ]\`); if interactive, use sudo. If not, fall through to osascript with the native macOS admin GUI dialog (with a branded prompt explaining what airc is doing — Joel 2026-04-27 relay through continuum-b69f). ## Live verification Pre-fix doctor on this Mac (Remote Login enabled live via osascript): ``` [MISSING] sshd -- needed when you HOST a room ``` Post-fix: ``` [ok] sshd (Remote Login enabled) ``` ## Same probe in install.sh The Darwin branch of \`_ensure_sshd_running\` now: - detects via launchctl print system (matching doctor) - splits sudo (TTY) vs osascript (non-interactive) for the elevation - both paths print airc-branded explanation in the admin prompt
…stall) (#159) fix(doctor): tailscale probe uses resolve_tailscale_bin (catches macOS GUI install) Bare `command -v tailscale` false-negatives on every macOS install that came from the App Store or downloaded .dmg — Tailscale.app's binary lives at /Applications/Tailscale.app/Contents/MacOS/Tailscale, not on PATH. Caught live 2026-04-27 when airc doctor reported "tailscale not installed" on this Mac while airc was actively publishing a Tailscale IP (100.91.51.87) in the room gist envelope. resolve_tailscale_bin() already exists (called by host_address_set, tailscale_login_check_or_prompt, etc.) — handles the GUI bundle path AND windows tailscale.exe AND Linux PATH. Doctor probe just needs to use it instead of `command -v`. Live verify on this Mac: - pre-fix: `[info] tailscale (optional) -- not installed` - post-fix: `[ok] tailscale (optional) -- daemon up`
…gnosis — sshd bind EPERM) (#160) fix(install,doctor): Windows HNS port-22 reservation + firewall rule (continuum-b69f diagnosis) Bug found by continuum-b69f mid-Windows-bringup 2026-04-27: \`Start-Service sshd\` failed with "Cannot bind any address" / permission denied even with admin. Root cause: Windows HNS (Host Network Service — backs Hyper-V, WSL2, Docker Desktop) dynamically reserves port ranges at boot. The reservations rotate per-boot and are NOT visible in \`netsh int ipv4 show excludedportrange\` (which only shows static admin reservations). When port 22 randomly falls inside an HNS-held range, sshd's bind() returns EPERM at OS level, regardless of admin status. Sources: - https://keasigmadelta.com/blog/how-to-solve-cannot-bind-to-port-due-to-permission-denied-on-windows/ - docker/for-win#3171 - https://gist.github.com/strayge/481a77d31a94e133a76662877b1a90ca ## Persistent fix (this PR) Two-step persistent workaround applied during admin-elevated sshd install. Both ops idempotent — re-run of install on a healthy box doesn't re-prompt or duplicate state. 1. \`reg add HKLM\\SYSTEM\\CurrentControlSet\\Services\\hns\\State /v EnableExcludedPortRange /d 0 /f\` Disables HNS auto-exclusion. Survives reboots. 2. \`netsh int ipv4 add excludedportrange protocol=tcp startport=22 numberofports=1\` Explicitly reserves port 22 in the static excluded-port-range so HNS can't grab it on subsequent boots. Plus a New-NetFirewallRule for the OpenSSH-Server-In-TCP rule (the capability install usually creates it but it can be missing/disabled on some systems — idempotent check before creating). ## Files changed - \`install.ps1\` — \`Set-HnsPortFreedomFor22\` helper + wired into \`Install-OpenSSHServer\`. Native Windows installer path. - \`install.sh\` — Windows-bash branch's \`_ensure_sshd_running\` now emits a single elevated PowerShell payload that runs ALL the steps (capability install + HNS workaround + firewall rule + start + persist) so Joel/users click UAC ONCE for the whole sshd setup. - \`airc doctor\` — \`[MISSING] sshd\` Windows hint now includes the reg+netsh lines and explains why (HNS quirk). User can run all five commands as a contiguous block to remediate manually. ## Why this matters Pre-fix, even after the user ran the Add-WindowsCapability + Start- Service incantation from PR #156's hint or PR #157's auto-install, they could STILL hit the bind-EPERM if HNS happened to claim port 22 on their boot. Random failure, no diagnostic, looks like a permission bug. Continuum-b69f's diagnosis turns this from an unsolvable random into a one-time install action. ## Test posture (Mac regression) Mac side unchanged behavior; HNS branch only fires on MINGW/MSYS/CYGWIN. - part_persists: 8/8 - list: 4/4 - general_sidecar_default: 12/12 - platform_adapters: 11/11 ## Out of scope Cross-machine substrate end-to-end test once continuum's Windows host binds port 22 successfully. Parallel work; not blocked on this PR.
…velopes without it (continuum's diagnosis) (#162) fix(parser,prereq): jq is required, not optional — fallback parser corrupts gist envelopes without it Bug found by continuum-b69f Win→Mac e2e 2026-04-27 (forensics in cross-Mac/Windows coord gist): continuum's airc connect from Windows Git Bash succeeded with "Connected to '\"invite\":\"authenticator-fd63'" — JSON envelope syntax leaked into the displayed peer name. Worse: - room_name file never written to disk - subsequent airc msg stored locally with from:"unknown" - broadcast never landed in mac host's messages.jsonl Two bugs from one root cause: **jq missing on Windows Git Bash.** ## Root cause cmd_connect's gist resolver has two paths: 1. JSON envelope parse via jq — sets `resolved` (invite string) AND `resolved_room_name` from `.name` field. 2. Legacy raw-string fallback — bare grep for the first `@.*@` line. When jq is absent on PATH (the default state on Git Bash), path 1 short-circuits silently. Path 2 grabs the whole quoted JSON line including the `"invite":"` key prefix. The downstream @-split (which extracts name@user@host:port) then captures the JSON-key fragment as the peer name. Worse: `resolved_room_name` is ONLY set inside path 1's room-case branch. Path 2 leaves it empty. Hence the `if [ -n "$resolved_room_name" ]; then echo ... > room_name` write at line 2495 never fires. Joiner connects "successfully" but doesn't know what room they're in. Subsequent msg sends queue/ship without room context; host filters them out. ## Fix (three layers) ### Layer 1: jq is now a required prereq (install.sh + install.ps1 + airc doctor) - install.sh: added `jq` to the prereq install loop. pkgname_for maps `jq` → `jqlang.jq` on winget, bare `jq` on brew/apt/dnf/pacman/apk. - install.ps1: new `Install-IfMissing -Name 'jq' -WingetId 'jqlang.jq'` line. - airc doctor: new probe `_doctor_probe "jq" "Gist envelope parser (rooms, addresses)"` flags missing jq with the same install hint shape as other prereqs. ### Layer 2: legacy fallback now strips JSON-key prefix The grep-based fallback can still be reached on minimal environments that genuinely don't have jq (busybox+nothing, weird CI). Pre-fix it captured `"invite":"authenticator-fd63@...` verbatim. Post-fix: `sed -E 's/^[^a-zA-Z]+//'` strips leading non-letter characters before the @-split runs. JSON quotes, key syntax, leading whitespace all stripped uniformly. ### Layer 3: legacy fallback now extracts room name When jq is missing, the fallback also walks the raw_content for `"name": "..."` and captures the value into `resolved_room_name`. Same JSON envelope shape as the jq path; sed-only so it works without any JSON parser. Empty for legacy gists (no envelope) — matches pre-existing behavior on those. ## Why three layers Layer 1 (jq required) is the canonical fix — every install going forward has jq, the JSON path always works. Layers 2+3 are defense in depth: any environment that escapes layer 1 (older airc installs, manual installs, distros where jq install fails) won't silently corrupt — fallback now produces a correct peer name AND the right room_name file. ## Test posture Mac doctor with PR live: all probes [ok] including new jq. ``` [ok] git [ok] gh [ok] gh authenticated [ok] openssl [ok] ssh [ok] ssh-keygen [ok] python3 [ok] jq [ok] sshd (Remote Login enabled) [ok] tailscale (optional) -- daemon up ``` Mac regression: - part_persists: 8/8 - list: 4/4 - general_sidecar_default: 12/12 - platform_adapters: 11/11 ## Out of scope continuum-b69f's UTF-8 → Latin-1 double-decode on `→` is a separate encoding bug in the bash → python3 → jq pipeline. File for follow-up; this PR is JSON-key-leak + jq-as-prereq.
…root cause) (#164) fix(python): AIRC_PYTHON env var replaces broken export -f shim (THE root cause continuum found) continuum-b69f's traced send 2026-04-27 found THE bug behind every silent-broadcast-failure on Windows Git Bash. Long-form analysis in the cross-Mac/Windows coord gist; tldr below. ## Bug PR #153 added a bash function shim: ```bash if ! python3 --version >/dev/null 2>&1; then if command -v python >/dev/null 2>&1; then python3() { command python "$@"; } export -f python3 2>/dev/null || true fi fi ``` `export -f python3` is supposed to propagate the function into subshells. On Git Bash MINGW, `export -f` succeeds silently but the function does NOT reliably inherit into `$(...)` command-substitution subshells. Result: every callsite that captures `$(python3 -c "...")` output (45+ in airc) bypassed the shim, hit the Microsoft Store stub, exited ~49 with empty stdout. The `|| echo ""` fallbacks on those sites then silently set config values to empty strings. Cascade: - `get_name` → `from:"unknown"` in stored messages - `get_config_val host_target ""` → empty → cmd_send takes HOST path (no `[ -n "$host_target" ]`), mirrors locally only, NEVER SSH-pushes - `get_config_val host_airc_home ""` → empty → would-be wrong path anyway (but moot since SSH was skipped) Net: continuum's Windows airc msg returned exit 0, mirrored locally, broadcast NEVER reached the mac host's messages.jsonl. cmd_send's "queue or die" failure paths never fired because cmd_send thought it WAS the host. Every Win→Mac broadcast invisible-failed. ## Fix (continuum's prescription) Replace the function-shim with a bash variable. Bash variables propagate to subshells unconditionally — no function-export quirks. ```bash if python3 --version >/dev/null 2>&1; then AIRC_PYTHON=python3 elif command -v python >/dev/null 2>&1 && python --version >/dev/null 2>&1; then AIRC_PYTHON=python else echo "ERROR: airc requires a working python3..." >&2 exit 1 fi export AIRC_PYTHON ``` Then sed across airc: every `python3 -c "..."` callsite (45 of them) becomes `"$AIRC_PYTHON" -c "..."`. The two `command -v python3` guards (which became unreliable under the Store-stub case) become `[ -n "${AIRC_PYTHON:-}" ]` — set if and only if a working python resolved at startup. ## Why this matters beyond Win→Mac The same `export -f` leak silently corrupted every config read on Windows Git Bash. Every `airc nick` rendered nicks blank; every `airc whois` walked an empty peer file path; every `cmd_send` was mirroring-locally-only. Three full days of "Windows works" reports were actually "Windows mostly works for read-only commands; sends silent-fail." This fix unblocks the whole Windows code path. ## Test posture (Mac regression — function-shim never fired here) - identity: 19/19 - whois: 5/5 - part_persists: 8/8 - list: 4/4 - general_sidecar_default: 12/12 - platform_adapters: 11/11 ## Out of scope continuum's secondary observations: 1. `relay_ssh` should fail loudly when host_target is empty rather than silent no-op. Defense in depth — this PR fixes the upstream cause; failing-loudly downstream is an additional safety net. 2. `|| echo ""` patterns on get_config_val / get_name silently mask ANY exec failure (not just Store-stub). Worth reviewing each callsite; out of scope for this PR which fixes the immediate blocker. Both filed as separate issues for follow-up.
…_* config write (continuum's retest) (#165) fix(airc): two PR #164 followups — sed missed line 1372 + harden host_* config write continuum-b69f's PR #164 retest 2026-04-27 found two remaining bugs: ## Bug A: sed missed `python3 -u -c '` at line 1372 PR #164's sed pattern was `python3 -c` — didn't match the `-u` flag sandwiched between python3 and -c at line 1372 (monitor_formatter unbuffered launch). On Windows Git Bash with the Microsoft Store stub, this site silent-failed too: monitor_formatter crashed at launch, the inbound stream went dark, joiner couldn't see anything the host wrote. One-line fix: `python3 -u -c '` → `"$AIRC_PYTHON" -u -c '`. ## Bug B: host_* config write silently no-op'd if ANY bash subst broke continuum's joiner config showed `name`, `host`, `host_target`, `created` but NOT `host_airc_home`, `host_name`, `host_port`, `host_ssh_pub`, `host_identity` — all five fields written together by the heredoc at line 2768. Pre-fix: ```bash HOST_IDENTITY="$host_identity_json" "$AIRC_PYTHON" -c " import json, os c = json.load(open('$CONFIG')) c['host_airc_home'] = '$host_airc_home' c['host_name'] = '$peer_name' c['host_port'] = ${peer_port:-7547} c['host_ssh_pub'] = '''$host_ssh_pub''' ... " 2>/dev/null || true ``` Five bash substitutions into python source. If ANY substitution breaks python parsing (newline in host_ssh_pub, special char in host_airc_home, empty/non-numeric peer_port, etc.) the whole heredoc crashes at parse time. `2>/dev/null || true` swallows the SyntaxError and zero fields land. Five silently-empty config fields downstream: - host_airc_home empty → cmd_send computes wrong remote path - host_name empty → "Connected to ''" banner - host_port wrong → SSH targets wrong port (or 7547 fallback) - host_ssh_pub empty → host's SSH key not in authorized_keys - host_identity empty → airc whois <host> shows (unset) Post-fix: pass everything as env vars; python reads from os.environ. Bash never touches the python source. Also emit stderr to a warn line (not /dev/null) so the future debugger can see it. Also catch ValueError on int(host_port) so a non-numeric value falls back to 7547 instead of dying. ## Pattern lesson bash → python heredoc with bash variable substitution into the python SOURCE is fragile. Any unusual byte in the variable can break python parsing. Same shape as the resolver heredoc that broke pre-PR #155 with set -e + pipefail. Repeat-offender pattern. Consider a sweep: every `"$AIRC_PYTHON" -c "..."` heredoc that contains `$bash_var` substitutions — convert to env-var pass + os.environ. Out of scope for this PR (would touch ~30 sites); file as a separate canary follow-up. ## Test posture Mac regression (5 scenarios, all green): - identity 19/19 - whois 5/5 - part_persists 8/8 - list 4/4 - general_sidecar_default 12/12 End-to-end Win→Mac broadcast verification still pending continuum's retest after pulling this fix.
…Phase 0) (#166) Joel 2026-04-27: "3000 lines of code dear god" → "yes" (start the architectural pivot to airc_core). Today's session shipped 17 PRs, ~half fighting bash → python heredoc fragility (silent SyntaxErrors, function-export leaks, missed sed patterns, swallowed stderr). The pattern is the problem: bash substituting variables INTO python source code is a per-site silent fail. PR #164 fixed the export -f leak via AIRC_PYTHON; PR #165 hardened ONE heredoc with env-var pass; ~30 more heredocs remain. This PR pivots: business logic moves to a Python truth-layer package (airc_core/), bash + ps1 become thin shells that invoke the Python via -m. Same input → same output → same testable code, no more bash-into-python escaping. ## Phase 0: foundation - `lib/airc_core/__init__.py` — package marker. v0.1.0. - airc bash resolves the lib dir at startup (4 candidates, first hit wins; canonicalizes to absolute via cd+pwd so PYTHONPATH stays valid across cwd changes). Sets PYTHONPATH unconditionally. - New debug command `airc debug-pythonpath` echoes the resolved path + tests `import airc_core` end-to-end. - install.sh changes: none needed — the existing clone-everything shape already pulls lib/ along. ## Phase 0a: first function migrated - `lib/airc_core/datetime.py` exposes `iso_to_epoch()` with a CLI entry: `python -m airc_core.datetime iso_to_epoch <ts>`. - Bash `iso_to_epoch` shrinks from 22 lines (3-fallback adapter chain) to 4 lines (single Python module call). - Test harness in scenario_platform_adapters updated to set AIRC_PYTHON + PYTHONPATH for the extracted-adapter shell so the test sees the Python module. ## Why iso_to_epoch as the first migration - Pure logic, no I/O — easiest to verify identical behavior. - Already adapter-fied in PR #151 (clean callsite contract). - Three callsites downstream — proves the pattern works for both the function definition AND its consumers. - Smallest possible blast radius if the pattern flubs. ## Test posture - platform_adapters: 11/11 (was 11/11; iso_to_epoch trio still green through the migrated code path) - part_persists: 8/8 (downstream consumer via heartbeat parse) - list: 4/4 (downstream consumer via _format_relative_time) - general_sidecar_default: 12/12 (sidecar spawn touches the path) ## Pattern for follow-up phases Phase 0a establishes the shape. For each subsequent migration: 1. Identify a heredoc-heavy function in airc bash. 2. Re-implement the logic in airc_core/<module>.py with a CLI entry. 3. Bash function becomes a 1-line `"$AIRC_PYTHON" -m airc_core.<module> <subcommand> "$@"` call. 4. Run integration tests; verify identical bash-side behavior. 5. Same module is callable from airc.ps1 (Phase 2 — drift between bash and ps1 ports goes away mechanically). Priority order for Phase 1 (high-fragility first): - pair handshake JSON build/parse (~80 lines, env-var pass already partially done in #165) - gist envelope build (host's response payload) - gist envelope resolve (joiner's parse — the JSON-key-leak class) - monitor_formatter (the long-running -u -c heredoc; missed by sed in #164, fixed in #165) - host_address_set (network enumeration) - config CRUD (45+ callsites; biggest dedupe but most plumbing) ## Out of scope for this PR - No Phase 1 migrations land here. Joel reviews the SHAPE first. - airc.ps1 still uses its own duplicate logic; that's Phase 2. - The 30+ remaining heredocs in airc bash still exist; they'll migrate one at a time per the Phase 1 priority order.
…1) (#167) feat(airc_core): migrate config CRUD (get_name, get_config_val) to airc_core.config (#152 Phase 1) Continuing the Python truth-layer migration started in PR #166. Phase 1: convert high-risk bash heredocs to airc_core modules incrementally. ## What - New `lib/airc_core/config.py` with `get(config_path, key, default)` + `get_name(config_path)` + CLI entry point. - Bash `get_name` and `get_config_val` shrink from inline python heredocs (with bash-variable substitution INTO the python source) to one-line `"$AIRC_PYTHON" -m airc_core.config get <key> <default>` calls. ## Why 45+ callsites across airc bash use these two helpers. Pre-migration each was an inline `"$AIRC_PYTHON" -c "import json; ...$1...$2..."` heredoc — bash $1 / $2 substituted INTO the python source. If the key or default contained quotes, special chars, etc., python parsing broke silently and the value fell back via `2>/dev/null || echo $2`. Continuum-b69f 2026-04-27 traced one symptom (host_target reading empty even when config.json had it) to this class. Now: CONFIG env var holds the file path; key + default come from argv. Python source is fixed bytes; bash never touches it. ## Test posture - identity: 19/19 (heaviest config-read scenario — name, identity fields, integrations all read via the migrated path) - whois: 5/5 - part_persists: 8/8 - list: 4/4 - general_sidecar_default: 12/12 - platform_adapters: 11/11 Direct unit-test of the CLI: - valid config → returns name correctly - missing config → returns default - get_name on valid config → name - both subcommands respond as expected ## Next migrations Per the Phase 1 priority queue (high-fragility first): pair handshake JSON build/parse → gist envelope build → gist envelope resolve → monitor_formatter → host_address_set. Each lands as a separate PR; integration tests verify identical bash-side behavior.
…Phase 1) (#168) Four field-extract sites for the host's handshake response (ssh_pub, airc_home, identity, reminder) were inline `python3 -c "import sys, json; print(json.load(sys.stdin).get('FIELD',''))"` heredocs. Same class as get_config_val pre-PR #167 — bash variable substitution into python source is a per-callsite silent-fail vector if the embedded value drifts. Now: response JSON via stdin; field name + default via argv. Python source is fixed bytes. ## CLI shape ``` echo "$response" | "$AIRC_PYTHON" -m airc_core.handshake get_field <name> [default] ``` Handles dict / list values via json.dumps so callers can re-parse (needed for the identity field, which is a nested object). ## Test posture - identity: 19/19 - whois: 5/5 - part_persists: 8/8 - list: 4/4 - general_sidecar_default: 12/12 - kick: 12/12 Plus direct CLI unit tests (valid response, missing field with default, nested object round-trip, empty stdin → default, garbage input → default). ## What's left in handshake-related code - Host's response BUILDER (line 3236, builds the JSON payload the joiner reads). Bash-substitutes name + airc_home + identity into python source. Same class. Migrate next. - Joiner's payload BUILDER (line 2580, sends payload TO host). Same pattern; same class. Both are smaller migrations following the same shape.
… heredocs (#152 Phase 1 cleanup) (#169) feat(airc_core): collapse _whois_in_scope + resolve_name + cmd_rename heredocs into get_config_val[_in] (#152 Phase 1) Cleanup pass following PR #167/#168. Eight more inline `python -c` heredocs collapsed into one-line calls now that airc_core.config handles the read pattern. ## Sites migrated 1. **resolve_name** (line 1228) — was duplicating the get_config_val logic inline. Now calls get_config_val. 2. **cmd_rename** (line 3369) — same. 3. **_whois_in_scope** (six sites) — host_name, host_identity, host_target (×2), host_airc_home, peer-file's identity, peer-file's host. All collapsed to get_config_val_in or airc_core.handshake get_field. ## New: get_config_val_in Like get_config_val but reads from an arbitrary config.json path. Used by _whois_in_scope's cross-scope walk (#134) which inspects sibling scope state without changing $CONFIG. Same module, same CLI; just different env var per call. ## airc_core.config: dict round-trip Extended `get` to JSON-encode dict/list values (matches handshake.get_field shape). Lets _whois_in_scope read host_identity + peer identity blobs as JSON-encoded strings that callers can re-parse. ## Test posture - whois: 5/5 - whois_cross_scope: 6/6 ← hottest path through _whois_in_scope - identity: 19/19 - kick: 12/12 - part_persists: 8/8 - list: 4/4 - general_sidecar_default: 12/12 ## Code reduction ~70 lines of inline python heredoc → ~10 lines of bash function calls. Each removed heredoc was a separate silent-fail vector (bash-substituted env var into python source code). ## Phase 1 progress - ✓ iso_to_epoch (Phase 0a) - ✓ config CRUD core (PR #167) - ✓ handshake response parse (PR #168) - ✓ _whois_in_scope + resolve_name + cmd_rename cleanup (this PR) - next: handshake/gist envelope BUILD sites, identity show/set, monitor_formatter
… Phase 1) (#170) The pair-handshake send was an inline `python -c` heredoc with FIVE bash-variable substitutions into the python source — name, host, ssh_pub, sign_pub, airc_home — plus the connect target as `('$peer_host_only', $peer_port)`. Any unusual character in any field could silently break python parsing. Specifically host_ssh_pub may contain a trailing newline (depending on how openssh-keygen wrote the .pub file); host_target may contain characters that need quoting; identity is a JSON-encoded blob of arbitrary user-set text. Each was a per-callsite silent-fail. ## Migration `airc_core.handshake.send(host, port)` reads all six fields from env: MY_NAME, MY_HOST, MY_SSH_PUB, MY_SIGN_PUB, MY_AIRC_HOME, MY_IDENTITY. Builds the JSON payload, opens TCP socket, sends, reads response, returns it as a string. Exceptions surface to stderr (matches the never-swallow-errors rule); bash captures stderr via `2>&1`. Bash callsite shrinks from 23 lines (inline python heredoc) to 8 lines (env-var pass + module call): response=$(MY_NAME="$my_name" \ MY_HOST="$(whoami)@$(get_host)" \ MY_SSH_PUB="$my_ssh_pub" \ MY_SIGN_PUB="$my_sign_pub" \ MY_AIRC_HOME="$AIRC_WRITE_DIR" \ MY_IDENTITY="$my_identity_json" \ "$AIRC_PYTHON" -m airc_core.handshake send "$peer_host_only" "$peer_port" 2>&1) || _pair_ok=0 ## Test posture Pair-handshake exercising scenarios all green: - tabs: 19/19 (two-tab pair on localhost — exercises send + receive) - identity: 19/19 (exchange identity at handshake) - whois: 5/5 (read identity from response) - kick: 12/12 (multi-peer pairing) - part_persists: 8/8 (sidecar + primary spawning) ## Phase 1 progress - ✓ iso_to_epoch (Phase 0a, PR #166) - ✓ config CRUD core (PR #167) - ✓ handshake response parse (PR #168) - ✓ _whois_in_scope cleanup (PR #169) - ✓ joiner handshake send (this PR) - next: host's response builder (line ~3236), self-heal/discovery heredocs, monitor_formatter
…-line migration, biggest single heredoc) (#171) feat(airc_core): monitor_formatter → airc_core.monitor_formatter (#152 Phase 1, biggest single migration) The biggest single heredoc in airc bash. ~250 lines of Python embedded in a `"$AIRC_PYTHON" -u -c '...'` block, complete with apostrophe- escape gymnastics like `caller'\''s` and `Joel'\''s` because bash single-quoting required them. Migrated to a proper Python module. ## Impact airc bash file: **5897 → 5647 lines** (−250 lines, ~4.2% reduction of the entire script). The migrated function had: - Inactivity watchdog (cross-platform: SIGALRM on POSIX, threading.Timer on Windows) - [rename] handler with chain-repair via stable host id - Ping/pong control message handling with auto-pong subprocess.Popen - Own-send filtering with mid-session rename support - Inbound mirror-to-local-log for joiners (avoids feedback loop on hosts) - Belt-and-suspenders error handling per line so one bad message doesn't kill the formatter All preserved verbatim — same logic, same stdin/stdout contract. The CLI shape: PEERS_DIR=<peers-dir> "$AIRC_PYTHON" -u -m airc_core.monitor_formatter <my_name> Bash function shrinks to 4 lines (was 268). ## Why a real .py file matters here The bash heredoc had: - `'\''` shell-escape sequences scattered through comments (caller's → caller'\''s) — readable Python source now restores natural apostrophes. - No editor syntax highlighting for python (it was inside a bash string). - No way to unit-test individual functions (_rename_files, _find_peer_by_host) without invoking the whole bash + airc stack. Now the module is a regular Python file: lints, syntax-highlights, unit-testable, importable from other airc_core modules if needed. ## Test posture 84 assertions pass across 8 scenarios touching monitor_formatter (every scenario that pairs + sends/receives): - tabs: 19/19 (two-tab message exchange) - identity: 19/19 (identity round-trip + rename) - whois: 5/5 (host_identity propagation) - part_persists: 8/8 (sidecar + primary monitor active) - list: 4/4 - general_sidecar_default: 12/12 - kick: 12/12 (multi-peer monitor traffic) - events: 5/5 (system-event formatting) ## Phase 1 progress - ✓ iso_to_epoch (Phase 0a, PR #166) - ✓ config CRUD core (PR #167) - ✓ handshake response parse (PR #168) - ✓ _whois_in_scope cleanup (PR #169) - ✓ joiner handshake send (PR #170) - ✓ monitor_formatter (this PR — biggest single migration) - next: host's pair-handshake handler heredocs, smaller cleanups
…#152 Phase 1) (#172) feat(airc_core): host pair-handshake accept_one → airc_core.handshake.accept_one (#152 Phase 1) Symmetric counterpart of PR #170 (joiner send) — the HOST'S accept- and-respond heredoc, biggest remaining bash-into-python heredoc with substituted variables. 127 lines of Python with EIGHT bash variable substitutions migrated to a clean Python module. ## Substitutions previously inline - $host_port — the listen port (numeric, but bare-substituted) - $PEERS_DIR — joiner's peer file path - $(timestamp) — bash command-substitution INTO python (highest risk) - $IDENTITY_DIR — host's ssh_key.pub source - $CONFIG — host's identity load path - $name — host's identity name - $reminder_interval — numeric reminder interval - $AIRC_WRITE_DIR — host's airc_home (sent in response) - $MESSAGES — system-event log path Each was a per-callsite silent-fail vector. Continuum traced the write-side variant (#165) earlier today. ## Migration `airc_core.handshake.accept_one()` reads all from env vars (HOST_PORT, PEERS_DIR, IDENTITY_DIR, CONFIG, HOST_NAME, REMINDER_INTERVAL, AIRC_WRITE_DIR, MESSAGES). Bash callsite shrinks from 127 lines (heredoc body) to a 9-line env-var-pass + module call. Same logic preserved verbatim — accept-with-timeout, parent-death detection (`os.getppid() == 1`), authorize joiner SSH key, write peer record (with stable-host stale cleanup), build response, write peer-joined system event. The outer `while true; do ... done &` bash loop unchanged. ## Impact - airc bash: 5647 → 5529 (-118 lines) - Cumulative today (Phase 1): ~370 lines moved out of bash to testable Python modules. ## Test posture (Mac, 89 assertions / 9 scenarios) - tabs: 19/19 (two-tab pair on localhost — exercises full accept loop end-to-end) - scope: 5/5 (multi-cwd pairing across scopes) - identity: 19/19 (identity exchange at handshake) - whois: 5/5 - kick: 12/12 (multi-peer, multiple accepts) - part_persists: 8/8 - list: 4/4 - general_sidecar_default: 12/12 - events: 5/5 (peer-joined system event emission) ## Phase 1 progress - ✓ iso_to_epoch, config CRUD, handshake parse, _whois cleanup, joiner send, monitor_formatter (PRs #166-#171) - ✓ host accept_one (this PR) - next: smaller cleanups (lan_ip resolver, identity/peer config writes, remaining gist-envelope bash heredocs)
…ibility (#152) (#173) Joel 2026-04-27: "think my bigger issue is 5000 line files... like straightforward programming... senior would have hit pause at 500." Lesson saved (memory: flag file size proactively, threshold ~500 not ~5000). Starting Phase 3 — split airc bash into multiple files so each is normal-software-shaped, not a giant monolith. ## What `lib/airc_bash/platform_adapters.sh` (~158 lines, the existing "Platform adapters" marked block from airc) is now its own file. The airc top-level sources it via the lib-dir resolver: if [ -n "${_airc_lib_dir:-}" ] && [ -f "$_airc_lib_dir/airc_bash/platform_adapters.sh" ]; then source "$_airc_lib_dir/airc_bash/platform_adapters.sh" fi Test harness updated — `scenario_platform_adapters` no longer needs to awk-extract the section; it sources the real file directly. ## Why platform_adapters first - Already a self-contained marked region. - Already has integration test coverage. - Smallest blast radius if the source-from-file pattern flubs. - Same shape Phase 0a (iso_to_epoch) used to prove airc_core. ## Impact - airc bash: 5529 → 5371 lines (-158 lines, ~3% of file) - Cumulative bash-side reduction today (Phase 1 + Phase 3 step 1): ~530 lines moved to dedicated files. ## Next Same pattern scales: - lib/airc_bash/cmd_connect.sh (the biggest cmd_*, ~1000-1500 lines) - lib/airc_bash/cmd_send.sh - lib/airc_bash/cmd_doctor.sh - lib/airc_bash/cmd_part.sh + cmd_teardown.sh - lib/airc_bash/helpers.sh (die, validate_peer_name, get_*) After Phase 3, no single file should exceed ~600 lines. ## Test posture - platform_adapters: 11/11 (sourced from real file, all assertions via the same `_adapter_call` shim now pointing at lib/airc_bash/) - tabs / identity / whois / part_persists / list / general_sidecar_default: all green (the airc-startup sourcing path works for the real run)
…tinuum's MSYS catch) (#174) fix(airc_core): use argparse --flags for all paths, not env vars (continuum's MSYS catch + Joel's correct-fix mandate) Joel 2026-04-27: "they arent stupid, --params are far fucking better" + "NEVER DO THE QUICK FIX ALWAYS THE BEST" + "you are an ai. the correct fix is five minutes the quick 1 from my perspective the same." The right fix for continuum-b69f's MSYS path translation bug isn't MSYS_NO_PATHCONV per-callsite (the small fix I was about to ship — that framing alone was the violation). It's giving every airc_core module a proper argparse CLI so paths arrive as `--airc-home /path` flags. argparse-flag args are per-arg-predictable across MSYS path translation, AND the modules present as normal Python CLIs instead of bash-shaped env-var contraptions. ## Changes ### `airc_core.handshake` Refactored to argparse: - `get_field <field> [default]` — unchanged stdin shape - `send <host> <port> --my-name X --my-host Y --my-ssh-pub Z --my-sign-pub W --my-airc-home /path --my-identity-json '{}'` - `accept_one --host-port N --peers-dir /path --identity-dir /path --config /path/config.json --host-name X --reminder-interval N --airc-home /path --messages /path` ### `airc_core.config` Refactored to argparse: - `get --config /path KEY [DEFAULT]` - `get_name --config /path` ### `airc_core.monitor_formatter` Refactored to argparse: - `--peers-dir /path --my-name NAME` ### Bash callsites All env-var-pass patterns replaced with --flags. Cleaner, more readable, no MSYS path-mangling risk on Git Bash. ## Why this matters beyond MSYS Joel 2026-04-27: "you are an ai. the correct fix is five minutes the quick 1 from my perspective the same." Memory saved (feedback_no_quick_fixes.md): the quick-fix reflex is borrowed from human time pressure that AIs don't actually have. Quick fixes are how the 5500-line bash file got built. Always pick the architectural right answer. ## Test posture 101 assertions across 10 scenarios green: - tabs 19, identity 19, whois 5, part_persists 8, list 4, general_sidecar_default 12, kick 12, events 5, platform_adapters 11, whois_cross_scope 6 Plus `--help` output for each module is now standard argparse format.
…#175) feat(airc-bash): extract cmd_doctor + _doctor_* helpers (Phase 3 — airc under 5000) 435 lines (cmd_doctor, _doctor_detect_pkgmgr, _doctor_install_cmd_for, _doctor_probe, _doctor_probe_gh_auth, _doctor_probe_sshd, _doctor_probe_tailscale, _doctor_connect_preflight, _doctor_run_tests) extracted to lib/airc_bash/cmd_doctor.sh, sourced from airc top-level via the lib-dir resolver. ## Impact - airc bash: 5386 → 4952 lines. **Below 5000 for the first time today.** - New file: 435 lines, self-contained. ## Why doctor was a clean candidate - All `_doctor_*` helpers used by cmd_doctor only — no exterior consumers. - The probes use `detect_platform` / `get_config_val` from airc top-level (resolver sources platform_adapters before this file, and config CRUD helpers are still in airc). - Already organized as a marked logical section. ## Live verify `airc doctor` on this Mac: all probes [ok]. git, gh, gh authenticated, openssl, ssh, ssh-keygen, python3, jq, sshd, tailscale — all green via the sourced file. ## Test posture (66 assertions / 6 scenarios) - tabs 19, identity 19, whois 5, part_persists 8, list 4, general_sidecar_default 12 ## Remaining biggest sections in airc - cmd_connect (~1500 lines) — still in airc, biggest remaining slice - cmd_send (~300 lines) - cmd_part / cmd_teardown (~250 combined) - gist envelope build (~200) Continued split brings each below the ~500 threshold Joel called out.
…(continuum's #174 follow-up) (#176) fix(airc_core): config set_host_block subcommand — last env-var-pass site converted to argparse (continuum's PR #174 follow-up) continuum-b69f's #174 retest 2026-04-27 found that PR #174 missed the host_* config WRITE site (the post-handshake "store host details" block). It still used env vars, so MSYS path-translated $host_airc_home on Git Bash before python read it from os.environ. Same silent-fail class as the rest of #174. ## Fix (matches PR #174 pattern verbatim) New subcommand: `airc_core.config set_host_block`. ```bash "$AIRC_PYTHON" -m airc_core.config set_host_block \ --config "$CONFIG" \ --host-airc-home "$host_airc_home" \ --host-name "$peer_name" \ --host-port "${peer_port:-7547}" \ --host-ssh-pub "$host_ssh_pub" \ --host-identity-json "$host_identity_json" ``` Bash callsite is one airc_core invocation; no env-var pass; no python heredoc with bash substitutions; no `2>/dev/null` swallowing errors. The CLI errors are surfaced via stderr per the never-swallow-errors rule. ## Why this matters per Joel's "always the right fix" PR #174 was the right approach for the SEND/PARSE/ACCEPT sites. PR #165 (env-var hardening) was a defensive partial fix at the WRITE site. Today we close the loop — same architecture across all config-mutating sites. ## Test posture 95 assertions / 9 scenarios green: - tabs 19, identity 19, whois 5, part_persists 8, list 4, general_sidecar_default 12, kick 12, events 5, platform_adapters 11 Unit test: - set_host_block writes valid JSON with all fields preserved uncorrupted (path / SSH pubkey / identity dict round-trip)
…nd-to-end (#177) fix(msys): export MSYS2_ARG_CONV_EXCL at airc startup — last layer of cross-machine fix (continuum's catch + verify) continuum-b69f's diagnosis 2026-04-27: even with PR #174 + #176's argparse `--flags`, MSYS Git Bash on Windows translates argv VALUES that look like Unix-rooted paths when bash invokes a Windows-native binary. So `--host-airc-home /Users/joelteply/.airc` arrived at python.exe as `--host-airc-home C:/Program Files/Git/Users/joelteply/.airc`, the joiner cached the corrupted path, the SSH command later sent it to a real Unix host that had no such file. Silent broadcast failure. ## Fix ```bash export MSYS2_ARG_CONV_EXCL="${MSYS2_ARG_CONV_EXCL:-/Users/;/home/;/root/}" ``` Set once at airc startup, exported, every airc_core invocation inherits the same translation policy. Targeted prefix list covers macOS / Linux / root home prefixes without breaking `/tmp/` or `/c/` paths (which DO need translation for `--config "$CONFIG"` where $CONFIG is on the local Windows filesystem). Honors a user override via the `${...:-...}` default-fallback. ## End-to-end verification continuum-b69f shipped a test broadcast from Windows after their local patch: > WORKING TEST: Windows-Mac airc msg via continuum-msyspatch with > targeted MSYS exclude. should land! Verified live in MY host's messages.jsonl on Mac: ``` {"from":"continuum-msyspatch","to":"all","ts":"2026-04-27T23:30:13Z", "msg":"WORKING TEST: Windows-Mac airc msg via continuum-msyspatch with targeted MSYS exclude. should land!","sig":"..."} ``` **Cross-machine Mac↔Windows airc end-to-end working.** This was the last bug in the chain that started at PR #153 (Microsoft Store python3 stub). ## Test posture (Mac, where the env var is a no-op) - tabs 19/19, identity 19/19, whois 5/5, part_persists 8/8, list 4/4, general_sidecar_default 12/12, kick 12/12, events 5/5, platform_adapters 11/11, whois_cross_scope 6/6 ## Today's full chain of cross-machine fixes #153 → #154 → #155 → #156 → #157 → #158 → #159 → #160 → #162 → #164 → #165 → #166 → ... → #176 → this. 27+ PRs to ship a working cross-machine airc on Windows. Every step revealed a new layer.
…lent-drop catch) (#178) fix(encoding): export PYTHONIOENCODING=utf-8 at airc startup (continuum's encoding-drop catch) continuum-b69f traced 2026-04-27: many cross-machine messages were getting SILENTLY DROPPED on Windows with: [airc:formatter] skipped one line: 'charmap' codec can't encode character '→' in position 37: character maps to <undefined> Windows Python defaults to the local code page (cp1252 on US/EU installs) for stdout. Common Unicode chars — →, em-dash, ✓, etc. — have no cp1252 codepoint, so `print(...)` raises UnicodeEncodeError. The formatter's per-line try/except catches it and skips, but from the user's view the message is just missing from the stream. ## Fix ```bash export PYTHONIOENCODING="${PYTHONIOENCODING:-utf-8}" ``` Set once at airc startup. Every Python subprocess airc spawns inherits utf-8 stdio. Honors user override via the default-fallback. Same shape as MSYS2_ARG_CONV_EXCL (#177) — environment-level fix that benefits every airc_core invocation without per-callsite changes. ## Why this is the right shape (per Joel's "always the best") Per-module sys.stdout reconfiguration is also possible, but: - Requires editing every airc_core module - Easy to miss a future module - Doesn't help bash-side code that might also print Unicode Setting PYTHONIOENCODING once at airc startup is the architectural answer — Python is told globally to use utf-8 for stdio, and every subprocess gets the right behavior automatically. ## Test posture 10 scenarios / 102 assertions green on Mac (env var is no-op on Mac where Python defaults to utf-8 already, but the export is harmless). Live python3 print of `→ ✓ ⚠ — em-dash` succeeds with the env var set. ## Follow-up Closes the silent-drop class continuum filed earlier today as #163 (UTF-8 → Latin-1 double-decode). The PYTHONIOENCODING fix is more general — it covers the OUTPUT side (Windows console encoding) AND the INPUT side (Python reading stdin will also use utf-8). #163 can be closed.
…lag (#179) (#183) Three entangled fixes for the multi-scope rename bug filed by vhsm-d1f4 + ideem-local-4bef on 2026-04-28: 1. cmd_rename now writes the new name to ALL scopes' config.json (primary + sidecars), not just the current scope. Reorder so config writes happen BEFORE the broadcast: cmd_send may die() (exit 1) when the scope's monitor is down, so a broadcast failure can't prevent propagation if propagation runs first. 2. cmd_send takes a new --internal flag for informational broadcasts ([rename], etc). When the monitor is down, --internal callers append to the local log and return 0 instead of die()ing. The monitor-down die is appropriate UX for explicit `airc send` (surfaces "you're broadcasting to nobody"), but wrong for [rename] — receivers heal via monitor_formatter's host-fallback on next traffic regardless. 3. cmd_rename's recursion guard moves from AIRC_RENAME_NO_PROPAGATE env var to a --no-propagate flag. Plus a new airc_core.config set_name subcommand replaces the inline-Python heredoc that was quoting- fragile. All params are now --flag form, consistent with the rest of the airc CLI surface (per README convention). Test fixture verifies: primary→sidecars, sidecar→primary, three-scope fan-out, --no-propagate guard, --help/missing-name UX. Integration suite passes — same 3 pre-existing flakes as canary, no regressions (180→181 passing). Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… (#185) When the airc parent bash dies (terminal close, kill, Monitor tool teardown), the accept-loop subshell reparents to init but stays alive, re-spawning fresh python listeners every iteration. Each listener's own getppid() points at the orphaned bash subshell — never at init — so the existing `getppid()==1` socket-timeout check never fires. Result: orphan listeners hold the host port, accept incoming pair handshakes, write peer records, and stuff joiner SSH keys into authorized_keys — pointing at a dead host with no relay behind it. This is the cause of the integration suite's "port still held after teardown" + "alpha still listening" flakes. Two-layer fix: 1. Bash accept loop: `while kill -0 PARENT` instead of `while true`. Captures airc bash's PID at startup; loop exits the moment that PID disappears, no fresh python is spawned past that point. 2. Python listener: --watch-pid flag wires the same airc bash PID into a daemon thread that polls os.kill(pid, 0) every second. When the parent dies, os._exit(0) breaks out of any in-flight accept()/recv() — covers the in-handshake case the bash check misses while a python is mid-iteration. Both layers watch the SAME PID (airc bash), not their immediate parent, because the immediate parent (accept-loop subshell) outlives airc bash by one iteration in the orphan scenario. Verified: - Orphan repro: SIGKILL airc bash → python exits via parent-watch within 1s, port freed (was: ghost listener + held port forever). - airc teardown still works (watch-pid is opt-in via --watch-pid 0). - Integration suite: 183 passing (vs 180 baseline on canary). Two long-standing flakes resolved: "port 7549 still held after teardown" + "alpha still listening after teardown". One remaining flake ("beta did NOT successfully pair") is unrelated — different scenario. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* ci: clean-install matrix (linux + macos + windows + windows-ps5)
Joel asked 2026-04-28: "guarantee clean mac and windows installs work,
and as much of this as possible is fixed... CI after fixing what we deem
important for release."
Three concurrent jobs on every PR + every push to canary/main:
- clean-install-linux: ubuntu install.sh + airc doctor + smoke (host
stays up, teardown clean).
- clean-install-macos: macos install.sh + same smoke.
- clean-install-windows: windows install.ps1 (pwsh) + airc doctor.
- clean-install-windows-ps5: install.ps1 under Windows PowerShell 5.1 —
the default that ships with Windows. Catches
regressions like #91 (bootstrap fails under
5.1 because airc.ps1 has #Requires -Version
7.0).
Plus, on push to canary/main only (not PRs — rate limits + flaky network):
- integration-suite: full test/integration.sh on ubuntu. The heavy
gate; serves as the canary→main green signal.
Concurrency group cancels superseded runs on the same ref. PR jobs run
on every push to the PR branch.
Until the open Windows install issues land (#91, #94, #95, #96, #97,
#98, #99, #152), the windows jobs treat `airc doctor` failures as
non-fatal — the install + bin-discovery itself still validates,
and we'll tighten to hard-fail once those are resolved.
Open issues this CI surface:
- #91 — bootstrap PS 5.1 (clean-install-windows-ps5)
- #94 — Tailscale winget package ID typo (install)
- #96 — install.ps1 doesn't install OpenSSH Server
- #98 — install.ps1 leaves DefaultShell unconfigured
- #152 — airc.ps1 ~20 commits behind canary
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* ci: smoke uses airc.pid not pgrep argv; doctor exit clears LASTEXITCODE
pgrep -f 'airc connect ...' didn't match the actual argv 'bash /path/to/airc
connect ...' on the runners. Switch to checking airc.pid which is
canonical (and what airc teardown itself reads).
For Windows: PS try/catch doesn't trap native exit codes — airc doctor
exited 1 because gh wasn't authed and tailscale wasn't installed (both
expected in CI), but the catch never fired. Run airc doctor directly,
log the LASTEXITCODE if non-zero, then explicitly exit 0 so the step
treats it as informational (the install + bin-discovery is what we're
gating on right now).
* ci: macOS smoke uses airc.pid too (was left on old pgrep code)
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ultShell + Get-RemoteHome (#94, #98, #99) (#187) * ci: real install path — drop AIRC_SKIP_PREREQS, hard-fail on doctor errors The skip-prereqs variant only validated the wiring (clone + symlink + PATH), not that install.{sh,ps1} can actually install everything missing on a stock runner. As Joel put it 2026-04-28: "need to get all installs working e2e or whats the point of a repo?" Changes: - linux + macos: drop AIRC_SKIP_PREREQS, drop sudo apt-get prereq preinstall; install.sh must handle it. - windows pwsh + windows PS 5.1: drop AIRC_SKIP_PREREQS; install.ps1 must handle the winget bootstrap. - airc doctor: hard-fail on non-zero exit. Was non-fatal during the initial wiring-test phase; now that real install is exercised, doctor must report environment-clean for the job to pass. This will surface the real Windows install issues (#91, #94, #96, #98, #99, #152) as CI failures so we can fix them with confidence. May also surface Linux/macOS prereq gaps that the skip-prereqs variant masked. * fix(install): Tailscale winget id case (#94); doctor exits 0 (informational) #94: install.ps1 uses 'tailscale.tailscale' (lowercase). winget --exact is case-sensitive, returns "No package found", install loop swallows the error as non-fatal, and the post-install probe reports "install completed but probe still fails." Result: every Windows install lacks Tailscale, even though the install log claims otherwise. Also fixed the same lowercase id in airc.ps1's user-facing fix-hint messages (line 328, 1293, 1414, 1420). Doctor: airc.ps1's Invoke-Doctor leaks $LASTEXITCODE from external probes (`& gh auth status` etc), so the script's natural-end exit picks up whatever the last external returned — typically 1 on a fresh / CI install where gh isn't authed. Bash doctor (cmd_doctor.sh) just sets a counter and prints a summary, no exit, which is the documented contract for the default `airc doctor` (informational, like `git status`). The hard-fail gate is `airc doctor --connect` (#80), which is the documented preflight before connecting. Match the contract: explicitly set $LASTEXITCODE = 0 at the end of the default doctor. Bonus: .gitignore now excludes __pycache__/ + *.pyc — they leaked through earlier when running airc_core CLIs locally during testing. * fix(install.ps1): explicit exit 0 — `tailscale status` leaked LASTEXITCODE=1 Same pattern as the airc.ps1 doctor leak: external probes (notably `tailscale status` when the user hasn't logged in yet — a normal post-install state) leave $LASTEXITCODE non-zero, and PowerShell's script natural-end exit picks it up. Every clean install on a fresh runner / VM exited 1 even though the install fully succeeded. Explicit `exit 0` after the final guidance banner. * ci: re-trigger after macOS job hung overnight * fix(windows): DefaultShell=bash (#98) + Get-RemoteHome forward-slash (#99) Two tightly-coupled fixes that together make Windows airc HOSTS actually work end-to-end. Without these, every Windows-hosted room failed the moment a peer tried to send a message. #98 — install.ps1: Set-OpenSSHDefaultShellBash Windows OpenSSH defaults DefaultShell to cmd.exe. cmd.exe lacks `cat`, POSIX redirects, and the rest of the shell vocabulary that airc remote commands rely on (`cat >> $rhome/messages.jsonl && echo __APPENDED__`, etc.). Without this fix, every airc msg from a peer to a Windows host silently fails — the cmd.exe error goes to ssh stderr (which `airc send` looks at, but only for specific patterns), the message gets [QUEUED] forever, the user sees nothing. Locate Git for Windows bash.exe, write to HKLM:\SOFTWARE\OpenSSH\ DefaultShell. Idempotent — only writes when the registry value differs. Falls through with a loud warning if bash.exe can't be found (Git for Windows is already a hard prereq, so this should never fire in the install.ps1 flow). #99 — airc.ps1: Get-RemoteHome forward-slash conversion The host_airc_home config value is captured as a Windows path with backslashes ('C:\Users\Administrator\Documents\Cambrian\.airc'). When interpolated into an SSH remote command and the remote shell is bash (which #98 ensures), bash interprets the backslashes as escape characters and strips them — producing garbage like 'C:UsersAdministratorDocumentsCambrian.airc'. The redirect target becomes a non-existent relative path and `cat >>` silently fails. Forward-slash form ('C:/Users/.../.airc') is interpreted correctly by bash as an absolute path; Windows kernel32 accepts forward slashes everywhere it accepts backslashes, so the on-disk write on the host succeeds. Closes #98, #99. Together with #94 (Tailscale typo, already in this PR) the install.ps1 → airc.ps1 path is now end-to-end functional on a clean Windows install. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(install.ps1): ASCII-ify em-dashes — PS 5.1 reads UTF-8-without-BOM as cp1252 PS 5.1's parser barfed on em-dashes (U+2014 = 0xE2 0x80 0x94 in UTF-8 which Windows-1252 misreads) inside double-quoted strings in the new Set-OpenSSHDefaultShellBash function. Pre-existing em-dashes in comments have been there a while and passed because comment parsing is more tolerant; new ones in expandable strings broke the parse. Replaced all em-dashes in install.ps1 with ASCII '--'. install.ps1 is the bootstrap script — must work from default Windows PowerShell 5.1 where the user lands by default, and that means staying ASCII-clean. (airc.ps1 is fine — it's #Requires -Version 7.0 so PS 5.1 won't parse it; pwsh handles UTF-8 without BOM correctly.) * fix(install.sh): auto-skip sshd setup when CI=true (macOS hangs forever) macOS install.sh _ensure_sshd_running falls through to osascript 'do shell script with administrator privileges' when no TTY is attached (CI runners). osascript opens a GUI admin prompt waiting for password / Touch ID — there's nobody home in CI, so it hangs forever and the runner job silently consumes its full 6-hour timeout. Auto-detect CI=true (GitHub Actions, GitLab, Travis, CircleCI, Jenkins, etc. all set it) and skip the sshd setup block when present. Same effect as AIRC_SKIP_SSHD=1 but no manual env-var wiring per workflow. The hang manifested in PR #187's macOS job — install.sh was visibly stuck in 'Stage install.sh + run' for 5+ minutes with no progress while the linux + windows jobs completed in under a minute. * ci(install.sh): also skip Tailscale install when CI=true (it's optional) brew install --cask tailscale on macos-latest runners is multi-minute (download + GUI app install). Tailscale is documented as optional (LAN mesh works without it) and there's no tailnet behind the CI runner. Same CI=true gate as the sshd skip. --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…#189) Two bugs Joel reported in #184 (high severity, violates CLAUDE.md "never swallow errors"): PART 1 (joiner-side, well-understood): the escalation banner before exit-99 was stderr-only. Monitor-style stdout-only consumers (Claude Code Monitor tool, integration tests, simple `airc join | tee log`) got a silent disconnect with zero diagnostic on their primary surface. Fix: print escalation to BOTH stdout (single-line, parseable) and stderr (multi-line, banner-style, log-friendly). The stdout line uses the standard `airc:` prefix consumers already filter on. Daemon-aware: detect whether `airc daemon install` has been run; tell the user explicitly whether the upcoming exit-99 will trigger self-heal (daemon present → launchd/systemd respawn) or just kill the relay (no daemon → user must `airc join` again, hint to install daemon for auto-recovery). New helper `_daemon_installed` checks for the launchd plist or systemd user unit on disk — sibling to the existing cmd_daemon_status logic. PART 2 (host-side, unconfirmed): Joel observed the host monitor silently exit despite the loop being `while true; ... || true; sleep 1; done`. Root cause unidentified (re-exec subprocess plumbing? signal trap leak?). Add a loud diagnostic AFTER the while-true so any future fall-through leaves evidence: echo "airc: host monitor loop exited unexpectedly — restart with: airc join" Diagnostic, not a fix — but it satisfies "never swallow errors" while the root cause is being hunted. Closes the joiner-side half of #184; host-side stays open for further diagnosis. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 14 out of 15 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Run the elevated payload. Start-Process exits 0 if it could | ||
| # launch the elevated process; the payload's own exit code is | ||
| # what we care about (it explicitly `exit $LASTEXITCODE`s based | ||
| # on try/catch). | ||
| powershell.exe -NoProfile -Command "Start-Process powershell -Verb RunAs -Wait -ArgumentList '-NoProfile -Command \"$_elevated_payload\"'" 2>&1 \ | ||
| || _elev_rc=$? |
There was a problem hiding this comment.
The elevated PowerShell payload is interpolated directly into a -Command "$_elevated_payload" argument. Because _elevated_payload contains many embedded double quotes and newlines, this is very likely to break command-line parsing (and can fail unpredictably) on Windows Git Bash. Safer pattern: write the payload to a temp .ps1 file and Start-Process powershell.exe -File <temp> (or use -EncodedCommand) so quoting/newlines don’t corrupt the script content.
| def cmd_set_host_block(args) -> int: | ||
| """Atomically write the post-handshake host_* fields into config. | ||
|
|
||
| Replaces a fragile env-var-passed python heredoc that bit on MSYS | ||
| Git Bash (continuum-b69f's catch 2026-04-27): MSYS translates env | ||
| var values that look like Unix paths INTO the Windows-binary | ||
| subprocess, so /Users/... silently became C:/Program Files/Git/... | ||
| Argparse `--flags` are per-arg-predictable (callers can `//`-prefix | ||
| individual values or use MSYS2_ARG_CONV_EXCL targeted-ly), and | ||
| the python source is fixed bytes regardless of the values. | ||
| """ | ||
| try: | ||
| c = json.load(open(args.config)) | ||
| except (OSError, ValueError) as e: | ||
| print(f"airc-config-set-error: cannot read {args.config}: {e}", file=sys.stderr) | ||
| return 1 | ||
| c["host_airc_home"] = args.host_airc_home or "" | ||
| c["host_name"] = args.host_name or "" | ||
| try: | ||
| c["host_port"] = int(args.host_port) | ||
| except (TypeError, ValueError): | ||
| c["host_port"] = 7547 | ||
| c["host_ssh_pub"] = args.host_ssh_pub or "" | ||
| try: | ||
| c["host_identity"] = json.loads(args.host_identity_json or "{}") | ||
| except ValueError: | ||
| c["host_identity"] = {} | ||
| try: | ||
| json.dump(c, open(args.config, "w"), indent=2) | ||
| return 0 | ||
| except OSError as e: | ||
| print(f"airc-config-set-error: cannot write {args.config}: {e}", file=sys.stderr) | ||
| return 1 |
There was a problem hiding this comment.
cmd_set_host_block also writes config via json.dump(..., open(args.config, "w")) despite the docstring stating it’s atomic. Same risk of partial writes/corruption on interruption; recommend temp-file + os.replace() and with open(...) for both read/write handles.
| payload = json.dumps({ | ||
| "name": args.my_name, | ||
| "host": args.my_host, | ||
| "ssh_pub": args.my_ssh_pub, | ||
| "sign_pub": args.my_sign_pub, | ||
| "airc_home": args.my_airc_home, | ||
| "identity": json.loads(args.my_identity_json or "{}"), |
There was a problem hiding this comment.
cmd_send builds payload using json.loads(args.my_identity_json ...) outside the socket try/except. If --my-identity-json is malformed, this will raise and crash without a clear error, bypassing the existing error handling. Suggest validating/parsing the JSON in a try/except and returning a non-zero exit with a helpful stderr message.
| payload = json.dumps({ | |
| "name": args.my_name, | |
| "host": args.my_host, | |
| "ssh_pub": args.my_ssh_pub, | |
| "sign_pub": args.my_sign_pub, | |
| "airc_home": args.my_airc_home, | |
| "identity": json.loads(args.my_identity_json or "{}"), | |
| try: | |
| identity = json.loads(args.my_identity_json or "{}") | |
| except (ValueError, TypeError) as e: | |
| print(f"airc-handshake-send-error: invalid --my-identity-json: {e}", file=sys.stderr) | |
| return 1 | |
| payload = json.dumps({ | |
| "name": args.my_name, | |
| "host": args.my_host, | |
| "ssh_pub": args.my_ssh_pub, | |
| "sign_pub": args.my_sign_pub, | |
| "airc_home": args.my_airc_home, | |
| "identity": identity, |
| if b"\n" in data: | ||
| break | ||
|
|
||
| joiner = json.loads(data.decode().strip()) |
There was a problem hiding this comment.
cmd_accept_one does joiner = json.loads(data.decode().strip()) without any validation/exception handling. A malformed/partial handshake line will crash the listener (DoS) and may leave the connection/socket unclosed. Recommend wrapping decode/parse in try/except, closing conn, and returning cleanly (or emitting a structured error response) on bad input.
| joiner = json.loads(data.decode().strip()) | |
| try: | |
| payload = data.decode().strip() | |
| joiner = json.loads(payload) | |
| if not isinstance(joiner, dict): | |
| raise ValueError("handshake payload must be a JSON object") | |
| except (UnicodeDecodeError, json.JSONDecodeError, TypeError, ValueError) as e: | |
| print(f"airc-handshake-accept-error: invalid handshake payload: {e}", file=sys.stderr) | |
| conn.close() | |
| sock.close() | |
| return 1 |
| def _rename_files(peers_dir: str, old: str, new: str) -> bool: | ||
| old_json = os.path.join(peers_dir, f"{old}.json") | ||
| new_json = os.path.join(peers_dir, f"{new}.json") | ||
| if not os.path.isfile(old_json): | ||
| return False | ||
| try: | ||
| os.rename(old_json, new_json) | ||
| d = json.load(open(new_json)) | ||
| d["name"] = new | ||
| json.dump(d, open(new_json, "w"), indent=2) | ||
| except Exception: | ||
| pass |
There was a problem hiding this comment.
_rename_files (and _find_peer_by_host) uses json.load(open(...)) / json.dump(..., open(...)) without context managers, which can leak file descriptors in a long-running monitor (renames may happen repeatedly). Use with open(...) as f: for both reading and writing, and consider not swallowing all exceptions silently here to aid debugging when a rename fails.
…ey ACLs (continuum's catches) (#197) * fix(install.sh): stage payload as .ps1 file + ssh-keygen -A for hostkey ACLs Two Windows install bugs found via Mac↔Windows Claude debug loop on issue #196 (continuum-b69f testing on real Windows MINGW64): 1. **Inline payload mangled by 4-layer quote escaping.** Pre-fix: `... -ArgumentList '-NoProfile -Command "$_elevated_payload"'` The payload contained many "" (PS strings) and \\ (registry paths); bash double-quoted → ps outer -Command → Start-Process ArgumentList single-quoted → inner -Command double-quoted. Each layer ate quotes differently. PowerShell never parsed the payload, the elevated window opened + ran nothing + closed silently. No transcript ever written. Joel saw a "OpenSSH installed + started" success message contradicted by a missing-transcript warning on the same run. Fix: stage payload as a .ps1 file in $CLONE_DIR, run via `Start-Process -File <path>`. Zero-quoting on the boundary; the .ps1 file is plain PowerShell and quotes/backslashes work natively. 2. **sshd Start-Service fails with WIN32_EXIT_CODE 1067 ("terminated unexpectedly") on every fresh Windows OpenSSH install** because host-key files exist with overly-permissive ACLs (Authenticated Users / BUILTIN\\Users / Everyone). sshd refuses to load them ("sshd: no hostkeys available -- exiting"). Fix: add `ssh-keygen -A` to the elevated payload between the capability install and Start-Service. Idempotent — generates missing host keys AND restores correct ACLs (SYSTEM + Admins only) on existing ones. continuum-b69f's diagnosis. 3. **Bash side now re-queries sshd state post-elevation** as belt- and-suspenders. Previous behavior printed "OpenSSH installed + started" if the elevated payload exit was 0, even when no transcript was written and sshd wasn't actually running. The silent-success- while-broken path was the worst version of this bug. Now: bash calls `Get-Service sshd` from non-elevated PS; if state isn't "Running" it surfaces a "partial install" warning even when elevated exit was 0. Verified by continuum-b69f on real Windows MINGW64: PR #195 (which this PR builds on) now produces a complete transcript dumped to bash terminal. Without the ssh-keygen -A addition though, sshd Start-Service still failed in his run — that's what this PR adds. * fix(install.sh): kill em-dash + drop global try/catch + parse-check before UAC Three real bugs hiding behind one symptom on continuum-b69f's Windows machine: install reported "OpenSSH installed + started" while sshd was actually crashloop-stopped with exit 1067 ("no hostkeys available"). Joel called it "amateur try/catch" -- he was right. 1. Em-dash (U+2014) in a string literal mis-parsed under cp1252. PowerShell 5.1 reads BOMless .ps1 files as the system codepage (cp1252 on most Windows). UTF-8 em-dash is bytes E2 80 94. Byte 94 in cp1252 is RIGHT-DOUBLE-QUOTATION-MARK. Parser sees "...$path " ...rest" -- treats the trailing 94 as a closing string quote and the rest of the file fails to parse. Nothing executes. No log written. Elevated window blinks closed silently. Fix: heredoc is now ASCII-only AND we prepend a UTF-8 BOM as defense-in-depth so future edits don't regress. 2. Global try/catch + $ErrorActionPreference = "Stop" hid the parse error completely. The parse error happens BEFORE Start-Transcript runs -- nothing in the try/catch could catch it because the parser never reaches the try at all. The bash side saw "no transcript written" and printed the misleading "UAC denied or Start-Process failed" warning. Fix: drop both. Each step runs plainly. PowerShell prints native errors to the transcript and execution continues. Bash side already re-queries Get-Service sshd post-elevation as the source- of-truth verdict, so we don't need the script's exit code to lie about success. 3. Parse errors didn't surface until after UAC. Fix: bash side now runs [Parser]::ParseFile on the staged .ps1 from a non-elevated process before Start-Process is called. If any parse errors exist, we print them and abort -- no UAC prompt, no silent close, the user sees exactly what's wrong. Per Joel: "we prefer parser issues to actually error" -- this is how they actually error. Verified locally on continuum-b69f's box: new payload parses clean (456 tokens, no errors). Will end-to-end-test next. * fix(install.sh): icacls-reset host key ACLs (ssh-keygen -A alone is not enough) Previous commit's diagnosis was half-right: yes the host-key step needs work, but ssh-keygen -A is for *generating missing keys*, not for fixing ACLs on existing ones. Confirmed by capturing the elevated transcript on continuum-b69f's box -- ssh-keygen -A produced no output at all (existing keys were already there, nothing to do), and sshd still failed Start-Service with exit 1067. Ran sshd -ddd directly to see the underlying file-open errors: Failed to open file: ...ssh_host_rsa_key error:5 (ACCESS_DENIED) Failed to open file: ...ssh_host_rsa_key error:13 (ACL secure_permission_check failed) So sshd-as-LocalSystem can't read the host keys *and* their ACLs flunk sshd's own security check. Two distinct ACL problems, both fixed by the same pattern: take ownership, wipe inheritance, grant SYSTEM + BUILTIN\Administrators full control, no other ACEs. Tools considered: - FixHostFilePermissions.ps1: removed from Windows-OpenSSH years ago - OpenSSHUtils PS module: official, but PSGallery dep + module trust prompt = friction we don't want for an install script - icacls: in-box on every Windows + bulletproof. Picked this. The new step: takeown /F <key> # become owner icacls <key> /reset # wipe inherited ACEs icacls <key> /inheritance:r /grant SYSTEM:F /grant Administrators:F Output is captured per-key in the transcript so any failure is visible. ssh-keygen -A still runs first (cheap, idempotent) so any *missing* keys get auto-generated before the ACL fix runs. * fix(install.sh): delete + regen host keys (icacls /grant alone insufficient for sshd) icacls /grant SYSTEM:F /grant Administrators:F succeeded per the transcript on continuum-b69f's box, but sshd-as-LocalSystem still refused to load the keys with errors 5+13 (ACCESS_DENIED + ACL fails secure_permission_check). The post-fix ACLs are technically correct (SYSTEM + Admins only, no inheritance), but OpenSSH's permission check is fragile w.r.t. owner identity and explicit-vs-inherited handling. Cleaner: delete any existing host_key files and re-run ssh-keygen -A. Since ssh-keygen -A here runs from an elevated SYSTEM-context PowerShell, it sets the right owner (SYSTEM) and ACEs at creation time -- which sshd accepts. This sidesteps every "what does icacls think SYSTEM:(F) means" question entirely. Safe at install time: the host hasn't published any fingerprint to peers yet, so regenerating doesn't break anything. Subsequent installs where sshd is already Running (state == Running) skip this whole ensure_sshd_running block via the case statement. Also added a post-regen `icacls <rsa-key>` dump to the transcript so we can see at a glance what the resulting ACL looks like -- saves a UAC round-trip the next time something looks off. * fix(install.sh): strip creator ACE that ssh-keygen -A leaves on host keys Found via post-regen ACL dump on continuum-b69f 2026-04-28: C:\ProgramData\ssh\ssh_host_rsa_key BUILTIN\Administrators:(F) NT AUTHORITY\SYSTEM:(F) BIGMAMA\green:(M) <-- the bug ssh-keygen -A on Windows leaves an ACE for whichever user ran it (the creator), even when running elevated. OpenSSH's secure_permission_check rejects any non-(owner|SYSTEM|Administrators) ACE -- so the freshly regenerated keys still failed sshd's check, even though they had no inheritance and SYSTEM + Admins had Full Control. Fix: after ssh-keygen -A, run icacls /remove:g $(whoami) on each host_*_key to strip the creator's ACE. Combined with /inheritance:r + /grant SYSTEM:F + Admins:F, the resulting ACL is exactly what sshd wants: just SYSTEM and Administrators, no inheritance, no extras. The post-fix ACL is dumped to the transcript so we can verify it visually -- and so future "wait sshd still won't start" diagnoses have a paper trail of what the ACL looked like. * fix(install.sh): also chown host keys to SYSTEM (icacls /setowner) Found via Get-Acl owner check on continuum-b69f 2026-04-28: even after removing creator's ACE, ssh-keygen -A leaves the file OWNER as BIGMAMA\green (the elevated user). OpenSSH's secure_permission_check also looks at owner -- if the owner isn't in {SYSTEM, Administrators, running sshd user}, the check fails with error 13 even though access control entries are correct. Adding icacls /setowner 'NT AUTHORITY\SYSTEM' before the inheritance and grant calls so SYSTEM owns the key. Owner = SYSTEM, ACEs = SYSTEM + Admins, no creator, no inheritance -- the canonical OpenSSH-on- Windows host key permission state. * chore(install.sh): surface sshd dry-run + owner in transcript Adds a 'sshd -t' dry-run step from the elevated context and dumps the post-fix file owner alongside the ACL. Goal: when Start-Service sshd fails, the transcript shows exactly what sshd itself complains about ('no hostkeys available' vs 'bad ownership' vs config syntax) without needing another UAC round-trip to query. * fix(install.sh): reset C:\ProgramData\ssh + logs/ folder ACLs (the actual MS-documented cause) WebSearch turned up the exact MS Learn KB for our symptom (sshd -t passes from elevated, Start-Service fails 1067, no event log entry): https://learn.microsoft.com/en-us/troubleshoot/windows-server/system-management-components/error-1053-1067-7034-after-update-openssh-doesnt-start "This issue occurs if the C:\ProgramData\ssh and C:\ProgramData\ssh\logs folders have incorrect permissions. The permissions might be too limited or too open. For example, the SYSTEM account or the Administrators group might not have write permissions. For a second example, regular users might have write or full control permissions." Required ACL on each folder: SYSTEM : Full Control Administrators : Full Control Authenticated Users : Read & execute (no write) Owner: SYSTEM. Up to this commit we'd been fixing the host_*_key file ACLs only, never the parent folder. The Microsoft fix is on the FOLDER. Adds a new elevated-payload step that sets owner + inheritance + ACEs on both C:\ProgramData\ssh and C:\ProgramData\ssh\logs with (OI)(CI) inheritance flags so newly-created files inherit correctly. The Oct-2024 update introduced this strictness; the March-2025 update loosened it back into a warning ("Event ID 4: write access is granted to the following users: ..."), so machines fully patched past March 2025 may not need this. But continuum-b69f's box (Windows 11 24H2, build 26100.8115, otherwise fully patched) is still hitting the strict-mode failure -- so applying the documented fix is still required. * fix(install.sh): restart HNS service after port-22 reservation (the actual blocker) OpenSSH/Admin event log on continuum-b69f revealed the real blocker: sshd: error: Bind to port 22 on 0.0.0.0 failed: Permission denied. sshd: error: Bind to port 22 on :: failed: Permission denied. sshd: fatal: Cannot bind any address. Even with the HNS reg key (EnableExcludedPortRange=0) set AND netsh showing port 22 in the excluded range ('22 22 *' administered), sshd-as-LocalSystem still got EACCES on bind. HNS service was holding port 22 at a layer below netsh visibility -- the reg key + netsh reservation only take effect after a Restart-Service hns (or reboot). Adds an HNS restart immediately after the port-22 reservation step. Now sshd can actually bind port 22 when Start-Service runs the next step. This was already documented in continuum-b69f's memory file (reference_airc_windows.md) but the install.sh implementation never actually restarted the service. The host-key permission saga from the prior 7 commits in this branch turned out to be a sidequest -- those issues were real but not the blocker. sshd -t (which doesn't bind a socket) was passing the whole time. The real failure was at bind time, not at config-load time.
…ing (#199) fix(install.sh): auto-run 'gh auth setup-git' so gist ops don't prompt Joel hit this on 2026-04-28 -- Windows install with gh authenticated in keyring (gh auth status: Logged in to github.com), but every git operation against gist.github.com triggered a GUI password popup. Repeating, every airc op that touched a gist fired a fresh prompt. Cause: gh auth login stores its token in keyring/credman, but does NOT automatically register itself as git's credential helper. So git itself doesn't know how to use gh's token -- it falls back to asking the user for a password on every HTTPS push/fetch. The official one-liner is `gh auth setup-git`, which registers `gh auth git-credential` as the credential helper for github.com URLs in ~/.gitconfig. After this, git sees an HTTPS github.com URL, delegates auth to gh, gh hands back the token from its store, no prompt. Microsoft-supported, idempotent, ships with gh CLI itself. This goes in ensure_prereqs right after the gh-auth-status check, so fresh installs get it automatically. Skipped if already configured (idempotency check via `git config --get-all credential.https://github.com.helper | grep gh`).
…in) (#200) Joel 2026-04-28 ~01:00Z: "fix the monitor man / i cant go to bed till this is fixed". Windows had no daemon path -- `airc daemon install` died on $(uname -s) with "not supported on MINGW64_NT-...". Result: the only way to keep airc alive on Windows was to leave a Git Bash window open running `airc join`. nohup+disown didn't survive parent shell exit on MINGW64. Adds a Windows branch to cmd_daemon_install / uninstall / status mirroring the launchd (mac) and systemd (linux) patterns. ## Mechanism: HKCU Run-key, not Task Scheduler First attempt was schtasks //SC ONLOGON, but Windows requires admin to create per-user logon-triggered scheduled tasks (Access Denied for non-elevated users, even with //RL LIMITED). Per Joel: "i just want whatever is least hassle and also robust" -- forcing a UAC prompt at 'airc daemon install' time is exactly the kind of friction we kill. HKCU\Software\Microsoft\Windows\CurrentVersion\Run is the per-user autostart hive. Writing to it with `reg add` requires no admin (HKCU is user-scope), fires at every interactive logon for the user, and matches launchd-Agent / systemd-user semantics exactly. ## Implementation 1. `_daemon_os` returns "windows" on MINGW*/MSYS*/CYGWIN*. 2. `_daemon_install_schtasks` (kept the function name for grep continuity even though it's now reg-based) writes a launcher .bat to $scope/airc-daemon.bat that: - sets AIRC_HOME + AIRC_BACKGROUND_OK - exec's `bash -lc 'airc connect'` - on exit, logs to daemon.err and `goto loop` after 5s (matches launchd KeepAlive / systemd Restart=always) 3. `reg add` registers `cmd /c start "" /MIN "<launcher.bat>"` under HKCU Run, key name `airc-monitor`. 4. Fires-and-forgets `cmd /c start /MIN <launcher>` immediately so user doesn't need to logout/login to start the monitor. 5. uninstall: reg delete + kill + rm launcher .bat. 6. status: reg query for the entry + ps for the running airc-connect (matches PPID=1 orphan or falls back to airc.pid lookup). ## Verified locally on continuum-b69f $ airc daemon install ✓ Registered HKCU Run entry 'airc-monitor' (runs at every Windows logon) ✓ Started monitor in detached cmd window (minimized) $ airc daemon status Status: RUNNING (PID 341089) $ airc daemon uninstall ✓ Removed HKCU Run entry 'airc-monitor' ✓ Killed running daemon launcher process(es) ✓ Removed /c/Users/green/.airc/airc-daemon.bat $ airc daemon install # idempotent reinstall ✓ Registered ... ✓ Started monitor ... Detached process survives the launching bash exit (which `nohup & disown` could not on MINGW64). ## Note on AIRC_BACKGROUND_OK The launcher sets this env var because `airc connect` may otherwise refuse to run when not on a TTY. Same hint as the launchd plist's EnvironmentVariables block.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 13 out of 15 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| $reg = (Get-ItemProperty -Path "HKLM:\SYSTEM\CurrentControlSet\Services\hns\State" -Name "EnableExcludedPortRange" -ErrorAction SilentlyContinue).EnableExcludedPortRange; | ||
| $regChanged = $false | ||
| if ($reg -ne 0) { | ||
| reg add "HKLM\SYSTEM\CurrentControlSet\Services\hns\State" /v "EnableExcludedPortRange" /d 0 /f | Out-Null; |
There was a problem hiding this comment.
In the staged elevated PowerShell payload, the reg add ... EnableExcludedPortRange line omits /t REG_DWORD. reg add defaults to REG_SZ, which can set the wrong registry type and make the HNS workaround ineffective. Update the payload (and the printed manual-fix snippet) to specify the value type explicitly.
| reg add "HKLM\SYSTEM\CurrentControlSet\Services\hns\State" /v "EnableExcludedPortRange" /d 0 /f | Out-Null; | |
| reg add "HKLM\SYSTEM\CurrentControlSet\Services\hns\State" /v "EnableExcludedPortRange" /t REG_DWORD /d 0 /f | Out-Null; |
| printf " [MISSING] sshd -- needed when you HOST a room\n" | ||
| printf " Fix (admin PowerShell — five lines, run all together):\n" | ||
| printf " Add-WindowsCapability -Online -Name OpenSSH.Server~~~~0.0.1.0\n" | ||
| printf " reg add HKLM\\\\SYSTEM\\\\CurrentControlSet\\\\Services\\\\hns\\\\State /v EnableExcludedPortRange /d 0 /f\n" |
There was a problem hiding this comment.
The Windows sshd manual fix hint uses reg add ... EnableExcludedPortRange /d 0 without specifying /t REG_DWORD. Since reg add defaults to REG_SZ, this guidance may lead to the wrong registry value type. Consider updating the printed command to include /t REG_DWORD so the workaround reliably applies.
| printf " reg add HKLM\\\\SYSTEM\\\\CurrentControlSet\\\\Services\\\\hns\\\\State /v EnableExcludedPortRange /d 0 /f\n" | |
| printf " reg add HKLM\\\\SYSTEM\\\\CurrentControlSet\\\\Services\\\\hns\\\\State /v EnableExcludedPortRange /t REG_DWORD /d 0 /f\n" |
| def cmd_set_name(args) -> int: | ||
| """Atomically write the identity name into config.json. | ||
|
|
||
| Replaces the inline-Python heredoc that lived in cmd_rename. With | ||
| multi-scope rename propagation (#179), cmd_rename writes the name | ||
| into the primary scope AND every sidecar scope's config; doing it |
There was a problem hiding this comment.
The docstring says this write is "Atomically" performed, but the implementation writes directly to args.config via json.dump(..., open(..., "w")), which can truncate/corrupt the file if interrupted mid-write. Either implement an atomic write (write to a temp file + os.replace) or adjust the docstring to avoid claiming atomicity.
| def cmd_set_host_block(args) -> int: | ||
| """Atomically write the post-handshake host_* fields into config. | ||
|
|
||
| Replaces a fragile env-var-passed python heredoc that bit on MSYS | ||
| Git Bash (continuum-b69f's catch 2026-04-27): MSYS translates env | ||
| var values that look like Unix paths INTO the Windows-binary |
There was a problem hiding this comment.
This docstring claims the host_* block write is "Atomically" performed, but the function dumps JSON directly to the target path. To match the contract, consider writing to a temporary file and os.replace() it into place; otherwise, update the wording to reflect that it's a normal write.
| bash -c "source '$_adapters_extract'; $*" | ||
| AIRC_PYTHON="${AIRC_PYTHON:-python3}" \ | ||
| PYTHONPATH="${_airc_lib_dir}${PYTHONPATH:+:$PYTHONPATH}" \ | ||
| bash -c "source '$_adapters_file'; export AIRC_PYTHON='${AIRC_PYTHON:-python3}'; $*" |
There was a problem hiding this comment.
In _adapter_call, the inner bash -c sets AIRC_PYTHON using single quotes (export AIRC_PYTHON='${AIRC_PYTHON:-python3}'), which assigns the literal string ${AIRC_PYTHON:-python3} rather than expanding it. This will break adapters that rely on $AIRC_PYTHON (e.g. iso_to_epoch) on platforms where AIRC_PYTHON is not exactly python3. Prefer relying on the env var passed to bash -c, or export it without single quotes.
| bash -c "source '$_adapters_file'; export AIRC_PYTHON='${AIRC_PYTHON:-python3}'; $*" | |
| bash -c "source '$_adapters_file'; $*" |
fix(airc daemon): scope tracks cwd at install time, not always $HOME/.airc PR #200 follow-up. _daemon_scope was returning ${AIRC_HOME:-$HOME/.airc} unconditionally, but actual user state lives in $cwd/.airc per detect_scope(). So 'airc daemon install' from ~/continuum/ captured the wrong scope (~/.airc, empty), spawned a monitor that connected to nothing, user appeared offline despite 'RUNNING (PID xxx)' in status. Mirror detect_scope's logic exactly: AIRC_HOME if set, else cwd/.airc. Now 'airc daemon install' from a project dir captures THAT dir's .airc as the daemon's scope, launcher .bat sets AIRC_HOME=that, the spawned airc connect uses the right room state. Joel 2026-04-28 ~01:05Z caught this: 'lol obv if it worked you would have a monitor and be online. FAIL'.
…op) (#202) * fix(airc daemon): scope tracks cwd at install time, not always $HOME/.airc PR #200 follow-up. _daemon_scope was returning ${AIRC_HOME:-$HOME/.airc} unconditionally, but actual user state lives in $cwd/.airc per detect_scope(). So 'airc daemon install' from ~/continuum/ captured the wrong scope (~/.airc, empty), spawned a monitor that connected to nothing, user appeared offline despite 'RUNNING (PID xxx)' in status. Mirror detect_scope's logic exactly: AIRC_HOME if set, else cwd/.airc. Now 'airc daemon install' from a project dir captures THAT dir's .airc as the daemon's scope, launcher .bat sets AIRC_HOME=that, the spawned airc connect uses the right room state. Joel 2026-04-28 ~01:05Z caught this: 'lol obv if it worked you would have a monitor and be online. FAIL'. * fix(airc daemon): launcher cd's to cwd, skip AIRC_HOME (Windows fs view fix) Daemon installed via PR #200/#201 was still crashlooping (every 4s) because the launcher .bat set AIRC_HOME to a Windows-form path (C:\Users\green\continuum\.airc) which Git Bash's airc binary couldn't traverse cleanly downstream. Plus 'bash -lc' was reading login profile and re-exporting PATH which churned env. Restructured launcher .bat: 1. 'cd /d <cwd_win>' from cmd.exe so the bash subprocess inherits the project dir as pwd. detect_scope() then returns <cwd>/.airc the same way it does in the user's interactive shell. 2. Drop AIRC_HOME entirely — let detect_scope work normally. 3. 'bash -c' not 'bash -lc' — non-login skips profile, keeps the env we set in cmd uncorrupted. 4. Absolute Unix-form path to airc (cygpath -u) — bash -c doesn't read ~/.bashrc, so PATH may not include ~/.local/bin. 5. Errors log to daemon.err relative to cwd (already cd'd into it). Joel 2026-04-28 caught both the wrong-scope (PR #201) and now the crashloop. Verified locally: with this launcher shape, airc connect runs to completion + maintains the SSH tail to the host.
…203) (#204) fix(airc daemon): sentinel-marker for intentional re-exec (#203) Joel + continuum-b69f 2026-04-28: Windows daemon launcher's `:loop` respawned a fresh airc 5s after the original bash exited, racing the new airc that just took over via host-mode re-exec. Continuous crashloop on `airc daemon install` from a project dir whose room gist had a stale heartbeat (a common state on cold start). Root cause specific to Windows MSYS-bash: `exec env ... "$0" connect` is true execve on Linux/Mac (PID stays, parent never observes exit), but emulated as spawn-and-exit on Windows MSYS (parent bash exits + new airc bash takes over with a different PID). The daemon launcher's `bash -c "exec airc connect"` thus returns to the .bat after every host-takeover, which the .bat treats as a crash. Fix: - New helper `_write_reexec_marker` writes `<bashPID>:<unix-ts>` to `$AIRC_WRITE_DIR/airc.reexec-marker`. - Called immediately before all 5 `exec env ... "$0" connect ...` sites: 4 host-takeover paths (cmd_connect's stale-heartbeat self- heal in two different code paths × {rejoin-as-joiner, host}) + 1 cold-host split-brain race-loser path. - Daemon launcher .bat checks for the marker between iterations using `forfiles /p <scope> /m airc.reexec-marker /d 0` (file mtime today). If marker is fresh, the launcher prints a "re-exec'd; new process is now daemon, launcher exiting" message and exit /b 0 (no respawn). The new airc process from the exec is the running daemon now — competing-respawn would just kill it. On Linux/Mac the marker write is harmless: `exec` keeps the same PID, the parent bash never observes an exit, the launcher script (where applicable: launchd KeepAlive=true / systemd Restart=always) never sees the marker because it never re-enters its monitor loop. Trade-off: after intentional re-exec, the .bat exits → no auto- restart for crashes that happen LATER in the new airc's lifetime. User must wait until next logon or re-run `airc daemon install`. This is acceptable vs the current behavior (continuous crashloop after first re-exec). Future enhancement: .bat could transition to a "monitor mode" that polls airc.pid and only restarts if all PIDs in it are dead, but the simple exit-on-marker is the minimal viable fix for #203. Closes #203 once continuum-b69f re-tests on real Windows.
…arget 1, net -21) (#206) refactor(airc): _reexec_into helper consolidates 5 duplicated exec sites (#205 target 1) Net: -15 lines (38 deletions, 23 additions). First compression PR per #205's net-negative-diff mandate from Joel. Five sites in cmd_connect previously duplicated the same 3-line pattern: local _preserved_name; _preserved_name=\$(get_config_val name "") _write_reexec_marker exec env [\${AIRC_NO_DISCOVERY=1}] \${_preserved_name:+AIRC_NAME=\$_preserved_name} "\$0" connect <args> That's 5 × 3 = 15 lines of copy-paste, three of which were stale-host- takeover paths that diverged from the rejoin-race-loser paths only by the AIRC_NO_DISCOVERY=1 prefix. Plus 3 inline comment paragraphs explaining what the exec was for, also duplicated. Now: one helper _reexec_into <mode> <args>, mode in {rejoin, host}. Folds in the sentinel marker write (used to be its own _write_reexec_ marker function with 11-line block comment — collapsed into helper). Five call sites become one line each. Behavior unchanged: same env vars passed in same way, same exec arguments, same marker file written. Only the structure changed. Bonus: caller can no longer forget to call _write_reexec_marker before exec — the only path is via _reexec_into which always writes it. (Pre-fix, every new exec site was a fresh chance to forget it, which is exactly what triggered #203.) #205 Target 1 of 6. Joel's bar: every PR net-negative or extension- point-bearing. This is net-negative.
Six near-identical `if command -v cygpath ... else sed ...` blocks consolidated into two helpers. Each callsite goes from 5-6 lines to 1. cygpath when available; sed fallback for stripped-down environments. Sites collapsed (10 calls in 6 blocks): - airc:468-473 (resolve_tailscale_bin's where.exe fallback) - airc:5034-5039 (daemon installer: airc_bin_win + scope_win) - airc:5062-5067 (daemon launcher: cwd_win + airc_bin_unix) - airc:5079-5083 (daemon launcher: marker_win) - airc:5115-5119 (daemon launcher: launcher_win) - install.sh:464-470 (elevated payload .ps1 path) - install.sh:509-515 (elevated transcript log path) - install.sh:945-951 (ts_post_check tailscale where.exe) Net diff: +53 / -56 = **-3 lines** (just barely qualifies #205's net- negative bar — helper inline-duplication in install.sh ate most of the win because install.sh runs pre-clone and can't source from $CLONE_DIR yet). The win is in code quality, not line count: future cygpath sites call the helper, can't drift. Verified: airc daemon status works on Mac (sed-fallback path); install .sh runs clean (CI=true mode), all install jobs already green on canary. Continuum-b69f's #205 Target #3 — surface non-overlapping with his Target #1 (`_reexec_into` helper in cmd_connect's exec sites).
#208) refactor(airc): _self_heal_stale_host helper for stale-gist takeover (#205 target 4) Two near-identical 22-line blocks consolidated into one helper. Both were doing: random jitter → delete stale gist → re-list to detect race-loser → either rejoin via _reexec_into rejoin, or take over via _reexec_into host. ~22 lines × 2 sites = 44 lines duplicated. Helper is 24 lines (function body + 6 lines of doc comment + signature). Replace 2 sites with 1 line each. Net: +32 / -53 = **-21 lines.** Joel's framing 2026-04-28: "shell scripts are like classes." This helper is well-named (one job: heal a stale-host scenario) and the single call site form makes the intent obvious at the cmd_connect call site without a 22-line wall of self-heal mechanics inline.
target 2, net -40) (#209) refactor(airc): _daemon_install_done helper + trim duplicated comments (#205 target 2) Three platform daemon installers (launchd/systemd/schtasks) duplicated the same 5-line "Loaded into X / airc will auto-start / Logs / Status" print block. Plus the schtasks function had ~30 lines of comment paragraphs duplicating commit-history context (#200/#202 explanations). Now: one `_daemon_install_done` helper for the print footer, called by all three installers. Schtasks comment block trimmed to a 4-line summary that points at PR #202 for the bug-history detail. Behavior unchanged on every platform — same plist/unit/.bat content, same registration calls, same status output (just printed via the helper). #205 target 2 of 6.
…et -10) (#210) Diff stat: **+84 / -94 = -10 lines.** airc 5359 → 5283. Five new airc_core.config subcommands replace 5 inline-Python heredocs in airc bash: - `set --key K --value V` — generic single-key write - `unset_keys K1 K2 ...` — bulk-clear (used to nuke leftover host_*) - `read_parted` / `record_parted` / `clear_parted` — parted_rooms ops Bash side gains 3 one-line wrappers (`set_config_val`, `unset_config_keys`, plus the existing `_read/_record/_clear_parted_room` slimmed from 13-16 lines each to 1). Sites consolidated: - joiner-mode init (cmd_connect ~line 2225): 12-line heredoc → 4 set_config_val calls - host-mode init (cmd_connect ~line 2491): 14-line heredoc → 3 set_config_val + 1 unset_config_keys - _read_parted_rooms / _record_parted_room / _clear_parted_room: 3 × ~13 lines of inline Python → 1 line each Per Joel #205 + 'shell scripts are like classes': airc_core.config IS the config-state class; bash callers just dispatch to it. The class gains internal `_load`/`_save` helpers so each subcommand is 1-3 lines.
Two near-identical OS-detection functions: detect_platform in lib/airc_bash/platform_adapters.sh + _daemon_os in airc top-level. Both Darwin/Linux/MINGW switching with /proc/version WSL detection. Unify on detect_platform's surface, with output names from _daemon_os (darwin/linux/wsl/windows/unknown) since they match \`uname -s\` more directly than detect_platform's prior \"macos\"/\"windows-bash\". Changes: - detect_platform: rename outputs (macos→darwin, windows-bash→windows), inline the WSL check (\`grep ... && echo wsl || echo linux\`). - _daemon_os: deleted (16 lines). - 4 daemon callers: \$(_daemon_os) → \$(detect_platform). - cmd_doctor.sh _doctor_probe_sshd: case macos)→darwin), case windows-bash)→windows). Diff: +10 / -36 = **-26 lines.** airc 5283 → 5265. Verified: \`airc doctor\` still detects macOS Remote Login + Tailscale on Mac; \`airc daemon status\` still works (uses the new unified function).
refactor(airc-bash): extract cmd_connect to lib/airc_bash/ — Phase 3 file split
cmd_connect was the single largest block in the airc bash monolith (1355
lines, ~26% of the file). Splits it out verbatim into
lib/airc_bash/cmd_connect.sh, sourced via the same lib-dir resolver that
already loads cmd_doctor.sh and platform_adapters.sh.
airc: 5265 → 3911 lines (-1354)
lib/airc_bash/cmd_connect.sh: +1379 (1355 body + 24 header)
net: +25 (header + source-block overhead)
Behavior unchanged — cmd_connect calls airc top-level helpers (die,
ensure_init, get_config_val, set_config_val, relay_ssh, _reexec_into,
_self_heal_stale_host, spawn_general_sidecar_if_wanted, monitor, …)
and exposes only the cmd_connect function back to the dispatch.
Verified equivalence:
- bash -n on both files clean
- airc help / version / connect dispatch reach the same code paths
(same flag-parser error on `airc connect --help` as pristine canary)
- test/integration.sh tabs: 19/0 passing — full join + send + rename
+ peers flow under harness
This is Phase 0 of the structural decomposition Joel called out: the
"shell scripts are like classes" framing was applied to cmd_doctor and
platform_adapters but never to the bulk of cmd_X functions still inline.
Follow-ups will pull cmd_send / cmd_teardown / cmd_status / cmd_rooms /
cmd_rename / cmd_peers each into their own file. Eventually the airc
top-level retains only: bootstrap, helpers, dispatch.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…it (#214) refactor(airc-bash): extract cmd_daemon family — Phase 3 file split Pulls the cmd_daemon command group (cmd_daemon + cmd_daemon_install/ uninstall/status/log + 8 private _daemon_* helpers) out of the airc top-level into lib/airc_bash/cmd_daemon.sh, sourced via the same lib-dir resolver as cmd_doctor.sh / cmd_connect.sh / platform_adapters.sh. airc: 5265 → 4834 lines (-431) lib/airc_bash/cmd_daemon.sh: +461 (432 body + 29 header) Behavior unchanged. Cross-references resolve at call-time: - cmd_daemon.sh calls airc top-level helpers (die, detect_platform) - airc top-level (monitor self-heal, line ~1292) calls _daemon_installed defined in cmd_daemon.sh Verified: - bash -n on both files - airc daemon status — full plist/launchctl readout, log path correct Stacks alongside #213 (cmd_connect extraction). Each PR independently removes a major block from the bash monolith. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…plit (#215) refactor(airc-bash): extract cmd_send + cmd_ping — Phase 3 file split Pulls the outbound-message verbs (cmd_send + cmd_ping, 361 lines) out of the airc top-level into lib/airc_bash/cmd_send.sh, sourced via the same lib-dir resolver as cmd_connect / cmd_daemon / cmd_doctor. airc: 3504 → 3153 lines (-351) lib/airc_bash/cmd_send.sh: +383 (361 body + 22 header) cmd_send and cmd_ping are conceptually one group (ping is just send with a [PING:] marker that older clients gracefully degrade on); both go through the same envelope construction + queue-on-failure path, so they belong together. Behavior unchanged. Cross-references resolve at call-time: - cmd_send.sh calls airc top-level helpers (die, ensure_init, get_config_val, set_config_val, relay_ssh, get_host, …) - airc dispatch calls cmd_send / cmd_ping defined in cmd_send.sh Verified: - bash -n on both files - test/integration.sh tabs: 19/0 (one timing-flake on rename marker propagation that resolves on re-run; identical to canary HEAD behavior, not introduced here) Phase 0 progress (post this PR): airc top-level: 5265 → 3153 (-2112, -40%) lib/airc_bash: +2664 across cmd_connect / cmd_daemon / cmd_doctor / cmd_send / platform_adapters Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
refactor(airc-bash): extract cmd_teardown + cmd_disconnect — Phase 3 file split Pulls the leave/cleanup verbs (cmd_teardown + cmd_disconnect, 253 lines) out of the airc top-level into lib/airc_bash/cmd_teardown.sh. airc: 3153 → 2909 lines (-244) lib/airc_bash/cmd_teardown.sh: +273 (253 body + 20 header) Both verbs share the kill loop and split on what to clear afterwards (teardown wipes more aggressively; disconnect preserves identity + peers + history). Logically one group. Verified bash -n + smoke dispatch. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
refactor(airc-bash): extract cmd_status + cmd_logs — Phase 3 file split Pulls the introspection verbs (cmd_status + cmd_logs, 154 lines combined) out of the airc top-level into lib/airc_bash/cmd_status.sh. airc: 2909 → 2764 lines (-145) lib/airc_bash/cmd_status.sh: +170 (155 body + 15 header) cmd_status and cmd_logs were not contiguous in airc (cmd_logs lived ~30 lines below the cmd_doctor source-block); a single source-block in airc top-level now provides both functions. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
refactor(airc-bash): extract cmd_kick — Phase 3 file split Pulls cmd_kick (host-only peer eviction, 65 lines) into lib/airc_bash/cmd_kick.sh. Standalone — kick is host moderation, not identity — and extracting it first makes the surrounding identity block contiguous for the next extraction PR. airc: 2764 → 2711 lines (-53) lib/airc_bash/cmd_kick.sh: +82 (65 body + 17 header) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…md_whois) (#219) refactor(airc-bash): extract identity bundle — Phase 3 file split Pulls cmd_away + cmd_identity + cmd_whois + 7 _identity_* helpers (422 lines combined) out of the airc top-level into lib/airc_bash/cmd_identity.sh. airc: 2711 → 2290 lines (-421) lib/airc_bash/cmd_identity.sh: +448 (422 body + 26 header) The bundle was already cohesive — every helper is _identity_*, every public verb is about presence/persona — so one file rather than three. Verified: identity show / whoami / whois all dispatch correctly. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…le/invite/peers) (#220) refactor(airc-bash): extract channel/peer cluster — Phase 3 file split Pulls cmd_rooms + cmd_part + cmd_send_file + cmd_invite + cmd_peers (413 lines combined) out of the airc top-level into lib/airc_bash/cmd_rooms.sh. airc: 2300 → 1898 lines (-402) lib/airc_bash/cmd_rooms.sh: +441 (413 body + 28 header) Bundled because in IRC mental model these are all the same conceptual surface ("what rooms exist? who's in this one? how do I leave/invite/ transfer?"). One domain = one file. Verified: airc rooms / peers / invite all dispatch correctly. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…221) Pulls the three remaining cmd_X groups out of airc top-level: - cmd_reminder → lib/airc_bash/cmd_reminder.sh (32 → 46 lines w/ header) - cmd_rename → lib/airc_bash/cmd_rename.sh (101 → 121) - cmd_update + cmd_channel + cmd_version → lib/airc_bash/cmd_update.sh (130 → 148) airc: 1898 → 1663 lines (-235) ## Phase 3 file split — final summary airc top-level: 5265 → 1663 lines (-3602, -68%) lib/airc_bash/: 0 → 4569 lines across 13 files cmd_connect.sh 1379 (join/pair/host orchestrator) cmd_daemon.sh 461 (autostart family + helpers) cmd_rooms.sh 441 (channel/peer cluster: rooms/part/invite/send-file/peers) cmd_doctor.sh 441 (env health + connect preflight) cmd_identity.sh 448 (presence: away/identity/whois + helpers) cmd_send.sh 383 (outbound: send + ping) cmd_teardown.sh 273 (leave/cleanup: teardown + disconnect) platform_adapters.sh 176 (proc_/port_/file_ adapters) cmd_status.sh 170 (introspection: status + logs) cmd_update.sh 148 (release info: update/channel/version) cmd_rename.sh 121 (identity name change w/ multi-scope propagation) cmd_kick.sh 82 (host-only peer eviction) cmd_reminder.sh 46 (idle-nudge cadence) What's left in airc top-level (1663 lines): - bootstrap (lib-dir resolver, env, source-blocks) - helpers (die, ensure_init, get_/set_config_val, resolve_name, relay_ssh, get_host, monitor + monitor self-heal, _hash, …) - dispatch case + help text Verified: full integration suite (tabs scenario) passing 19/0. Closes the structural decomposition Joel called for (2026-04-27): "shell scripts are like classes; never ever again make 5000 line dumbass designs." Future passes should decompose cmd_connect.sh internally (host-mode vs joiner-mode vs heartbeat are clearly separable) — the 1379-line connect file is now the single largest remaining block. But the bash monolith itself is gone. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
What's in this bundle
31 commits since main, accumulated through canary cross-validation by the AI peer team (vhsm-d1f4, ideem-local-4bef, continuum-b69f, authenticator-fd63). 13 issues closed in the most recent session alone (#179, #132, #91, #163, #161, #145, #94, #97, #98, #99, #184, #142, plus the issue that filed #188).
Architecture: Python truth-layer (#152 Phase 0+1, PRs #166-#175)
`airc` (bash) is the user surface; business logic moves to `lib/airc_core/*` Python modules called via argparse CLIs:
Plus bash decomposition (`lib/airc_bash/*`) — `platform_adapters.sh`, `cmd_doctor.sh` extracted from the monolith. `airc` itself is now under 5000 lines (was ~5500).
All Python CLIs use argparse `--flags` for paths (per #174), not env vars — so MSYS path-translation on Git Bash is per-arg-predictable. `airc_core` carries the canonical types; bash callers are thin dispatchers.
Cross-machine substrate fixes (PRs #160, #164, #176-#178)
Cross-Mac/Windows airc messaging now verified end-to-end:
Windows install path fixes (PR #187 — closes #94, #97, #98, #99)
A Windows user can now do a clean install end-to-end:
Stability fixes
CI infrastructure (PRs #186, #187)
`.github/workflows/ci.yml` — five jobs on every PR + push:
`AIRC_SKIP_PREREQS` is NOT used — the CI tests the real install path on stock runners. `CI=true` auto-detection in install.sh skips sshd setup (osascript admin prompt hangs in CI) + tailscale install (slow, optional, no tailnet).
Smaller fixes
Risk profile
Test plan
🤖 Generated with Claude Code